AFIT/DS/ENC/98J-1 


Representations,  Approximations,  and  Algorithms  for 
Mathematical  Speech  Processing 


DISSERTATION 
Laura  R.  C.  Suzuki 
Major,  USAF 

AFIT/DS/ENC/98J-1 


19980629  028 

Approved  for  public  release;  distribution  unlimited 


griC  QUALITY  INSPECTED  1 


AFIT/DS/ENC/98J-1 


Approved: 


Representations,  Approximations,  and  Algorithms  for 
Mathematical  Speech  Processing 

Laura  R.  C.  Suzuki,  B.S.,  M.S. 

Major,  USAF 


Dr.  Gregory  T.  Warhola,  Research  Advisor 


Dr.  Alan  V.  Lair 


Dr.  Paul  I.  King,  Dean’s  Representative 


'XlAr.'L  <j$ 


H 


A 


S'. 


Dean  Robert  A.  Calico,  Jr 


Graduate  School  of  Engineering 


AFIT/DS/ENC/98J-1 


Representations,  Approximations,  and  Algorithms  for 
Mathematical  Speech  Processing 


DISSERTATION 


Presented  to  the  Faculty  of  the  Graduate  School  of  Engineering 
of  the  Air  Force  Institute  of  Technology 
Air  University 
In  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of 
Doctor  of  Philosophy 
in  Engineering 


Laura  R.  C.  Suzuki,  B.S.,M.S. 
Major,  USAF 


16  June  1998 


Approved  for  public  release;  distribution  unlimited 


AFIT/DS/ENC/98J-1 


Abstract 

Presented  in  this  document  is  work  leading  to  a  mathematical  frame  .tailored  to 
speech  processing.  This  development  is  based  on  work  in  three  loosely-connected  areas  of 
mathematics. 

The  first  area  involves  Carleson  inequalities  and  representations  in  the  Hardy  spaces, 
Hp( D),  for  p  >  1.  The  main  result  of  this  work  is  the  extension  of  a  Carleson  inequality 
theorem  and  a  representation  theorem  based  on  it. 

The  second  area  involves  mathematical  frames.  A  frame  for  i2(R) is  developed  that  is 
particularly  useful  for  representing  functions  with  time- varying  features.  Specific  examples 
of  this  frame  can  be  created  adaptively  to  fit  specific  functions,  for  example,  for  use  in 
data  compression.  Also  in  this  area,  a  more  general  representation  of  the  frame  operator  is 
presented.  This  alternate  representation  is  particularly  useful  in  that  it  allows  for  hybrid 
frame  coefficient  calculations,  where  the  coefficients  are  found  through  a  combination  of 
iterative  and  exact  techniques. 

The  third  area,  closely  related  to  the  first,  concerns  developing  frames  for  the  Hilbert 
space,  i?2(D).  The  representation  theorem  proven  for  the  Hardy  spaces,  Hp( D),  is  used  to 
create  frames  for  H2(I}). 

The  results  from  these  three  areas  are  combined  to  create  a  class  of  frames  for  L2(K) 
which  is  well  tailored  to  the  characteristics  of  speech.  Experiments  were  conducted  using 
a  computer  program  based  on  this  class  of  frames  to  validate  the  applicability  of  this  work 
to  speech  processing. 


Acknowledgements 


I  wish  to  thank  my  research  advisor,  Dr  Gregory  T.  Warhola,  who  stuck  by  me 
through  some  rough  times.  It  would  have  been  far  easier  for  him  to  give  up  on  me,  but  he 
did  not. 

I  could  not  have  completed  this  effort  without  the  help  of  my  research  committee 
members,  Dr  Alan  Lair,  Dr  Mark  Oxley,  Dr  Marty  Desimio,  and  Dr  Paul  King.  Without 
their  help  and  encouragement,  I  would  have  been  unable  to  complete  the  final  stage  of  this 
effort. 

I  owe  a  great  deal  of  thanks  to  my  husband,  Yoshiaki,  for  keeping  things  running 
smoothly  at  home  while  I  was  preoccupied  with  research.  My  son  Ian  earned  thanks  by 
demanding  some  of  my  attention  for  himself,  and  thereby  helping  me  to  keep  my  life  in 
balance. 

Over  the  years,  my  fellow  students  have  been  a  source  of  support,  guidance,  and 
inspiration.  Having  others  around  me  “in  the  same  boat”  made  it  easier  to  keep  my  own 
stumbling  stones  and  set-backs  in  the  proper  prospective.  The  ones  ahead  of  me  reminded 
me  that  the  task  at  hand  was  achievable.  Those  behind  me  reminded  me  how  far  I  had 
come.  A  few  of  those  many  students  are  Bruce  Anderson,  John  and  Cheryl  Columbi, 
Kimberly  Balkema  Demoret,  Amy  Magnus,  Rob  Reid,  Brian  Smith,  and  Terry  Wilson. 

I  owe  special  thanks  to  Dr  Steve  Rogers  and  Dr  Matthew  Kabrisky,  who,  although 
they  were  not  on  my  committee,  could  always  be  counted  on  for  support  and  encourage¬ 
ment. 


Laura  R.  C.  Suzuki 


Table  of  Contents 


Page 

Abstract .  ii 

Acknowledgements .  iii 

List  of  Figures .  viii 

List  of  Tables .  xi 

List  of  Symbols .  xii 

I.  Introduction .  1 

II.  Background . 3 

2.1  Speech  .  3 

2.1.1  The  speech  production  process .  3 

2.1.2  Speech  production  models .  6 

2.2  Human  Hearing  and  Speech  Perception .  8 

2.2.1  The  physical  auditory  system .  8 

2.2.2  Simple  auditory  model .  9 

2.3  Distortion  Measures .  10 

2.3.1  Descriptions  of  objective  speech  quality  measures  ....  10 

2.3.2  Evaluations  of  objective  speech  quality  measures  ....  14 

2.4  Important  characteristics  of  speech .  15 

III.  Representation  of  elements  of  the  Hardy  spaces,  Hp .  17 

3.1  Introduction .  17 

3.2  Preliminaries .  20 

3.2.1  Hardy  Spaces .  20 

iv 


Page 

3.2.2  The  pseudo-hyperbolic  metric,  p .  21 

3.3  Lemmas  concerning  various  inequalities .  25 

3.4  Supporting  Lemmas  and  Theorems .  28 

3.5  Main  result .  40 

3.6  Representation  of  elements  of  the  Hardy  spaces,  Hp{ D)  .  47 

3.7  Summary .  52 

IV.  Frames  for  L2( R)  and  a  frame-like  operator .  53 

4.1  Frame  and  frame  operator  properties  .  53 

4.2  A  frame  for  L2(R) .  58 

4.3  A  generalization  of  the  frame  operator. .  65 

4.4  Summary .  73 

V.  Representation  in  H2  (D) .  74 

5.1  Blaschke  Products .  75 

5.2  Frames  for  H2 (D)  76 

5.3  Projections  into  the  H2 (D)  frame .  94 

5.4  Summary .  98 

VI.  Application  to  speech  representation .  99 

6.1  A  frame  for  speech .  99 

6.1.1  Description  of  the  speech  frame .  99 

6.1.2  Isometry  between  H2(JB>)  and  L2(R+) .  100 

6.1.3  Frame  for  speech .  102 

6.1.4  Estimates  for  the  bounds  A  and  B .  103 

6.1.5  Frames  from  H2(D)  105 

6.2  Using  the  frame . 109 

6.2.1  Function  represented  by  the  sampled  data .  110 

6.2.2  Dual  frame  to  a  windowed  frame .  Ill 


v 


Page 


6.2.3  Inner  products  with  windowed  frame  elements  .  114 

6.2.4  Representation  of  frame  elements,  ^nBn .  114 

6.2.5  Choice  of  points  {an)k} .  115 

6.3  The  computer  program .  115 

6.3.1  Basis  set  selection .  116 

6.3.2  Determination  of  offset  times  and  analysis  window  sizes  116 

6.3.3  A  parameter  for  internal  rescaling .  117 

6.3.4  Basis  selection .  117 

6.3.5  Approximation  coefficients  .  118 

6.4  Summary .  118 

VII.  Computer  experiments .  120 

7.1  Fine-scale  analyses .  121 

7.1.1  Description  of  fine-scale  analyses .  121 

7.1.2  Discussion  of  results .  123 

7.2  Medium-scale  analyses .  150 

7.2.1  Description  of  medium  scale  analyses .  150 

7.2.2  Discussion  of  results .  152 

7.3  Large-scale  analysis .  179 

7.3.1  Description  of  large  scale  analyses .  179 

7.3.2  Discussion  of  results .  182 

7.4  Summary .  198 

VIII.  Conclusions .  199 

Appendix  A.  A  Space  of  Speech .  201 

A.l  Abstract  Speech  Space .  201 

A. 1.1  Desired  characteristics  of  the  space .  201 

A. 1.2  Description  of  the  Abstract  Speech  Space,  (5,  d)  ....  202 


vi 


Page 

A.  2  Linear  System  Background .  211 

A.2.1  Linear  Systems .  212 

A.2.2  Linear  System  Representation  of  Speech .  221 

Appendix  B.  Additional  proofs  .  224 

B. l  Additional  proofs  from  Chapter  III .  224 

B.2  Proof  of  Theorem  3.6.1  .  232 

Appendix  C.  Heuristic  algorithm  for  finding  glottal  pulses .  239 

Bibliography .  244 


vii 


List  of  Figures 

Figure  Page 

1.  Glottal  pulses .  4 

2.  Whispered  excitation .  5 

3.  Formant  structure  and  effects .  6 

4.  Simple  speech  production  model .  7 

5.  Data  used  for  fine-scale  analysis .  124 

6.  Fourier  transform  of  data  used  for  fine-scale  analysis .  125 

7.  Approximation  sequence  for  phoneme  /IY / .  128 

8.  Fourier  transform  of  approximation  sequence  for  phoneme  /IY/ .  129 

9.  Approximation  sequence  for  phoneme /OY/ .  130 

10.  Fourier  transform  of  approximation  sequence  for  phoneme  /OY/ .  131 

11.  Approximation  sequence  for  phoneme /S/ .  132 

12.  Fourier  transform  of  approximation  sequence  for  phoneme  /S/ .  133 

13.  Approximation  sequence  for  blocks  .  134 

14.  Fourier  transform  of  approximation  sequence  for  blocks .  135 

15.  Approximation  sequence  for  bumps .  136 

16.  Fourier  transform  of  approximation  sequence  for  bumps .  137 

17.  Approximation  sequence  for  bumpstx .  138 

18.  Fourier  transform  of  approximation  sequence  for  bumpstx .  139 

19.  Approximation  sequence  for  doppler .  140 

20.  Fourier  transform  of  approximation  sequence  for  doppler .  141 

21.  Approximation  sequence  for  heavisine .  142 

22.  Fourier  transform  of  approximation  sequence  for  heavisine .  143 

23.  Number  of  poles  vs.  L2  error .  144 

24.  Data  offset  vs.  L2  error  (/IY/,  one  pulse) .  146 

25.  Data  offset  vs.  L2  error  (/OY/,  one  pulse) .  147 


Vlll 


Figure  Page 

26.  Data  offset  vs.  L2  error  (/IY /,  two  pulses) .  148 

27.  Data  offset  vs.  L2  error  (/OY /,  two  pulses) .  149 

28.  Speech  data  segments  used  in  medium-scale  analysis .  152 

29.  Spectrograms  of  speech  segments  used  in  medium-scale  analysis .  153 

30.  Non-speech  data  segments  used  in  medium-scale  analysis .  154 

3L  Poles  chosen  for  approximations  (phoneme  /IY/,  three  pole  approximations)  156 

32.  Poles  chosen  for  approximations  (phoneme  /IY/,  six  pole  approximations)  157 

33.  Poles  chosen  for  approximations  (phoneme  /IY/,  10  pole  approximations)  158 

34.  Poles  chosen  for  approximations  (phoneme  /IY/,  32  pole  approximations)  159 

35.  Poles  chosen  for  approximations  (phoneme  /OY/,  three  pole  approximations)  160 

36.  Poles  chosen  for  approximations  (phoneme  /OY/,  six  pole  approximations)  161 

37.  Poles  chosen  for  approximations  (phoneme  /OY/,  10  pole  approximations)  162 

38.  Poles  chosen  for  approximations  (phoneme  / OY/,  32  pole  approximations)  163 

39.  Number  of  iterations  vs.  normalized  error  (phoneme  / IY/,  three  pole  ap¬ 
proximations)  .  164 

40.  Number  of  iterations  vs.  normalized  error  (phoneme  /IY/,  six  pole  approxi¬ 
mations)  .  165 

41.  Number  of  iterations  vs.  normalized  error  (phoneme  /IY/,  10  pole  approxi¬ 
mations)  . 166 

42.  Number  of  iterations  vs.  normalized  error  (phoneme  /IY/,  32  pole  approxi¬ 
mations)  .  167 

43.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/,  three  pole  ap¬ 
proximations)  .  168 

44.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/,  six  pole  approx¬ 
imations)  .  169 

45.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/,  10  pole  approx¬ 
imations)  .  170 

46.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/,  32  pole  approx¬ 
imations)  .  171 


IX 


Figure  Page 

47.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/,  non-aligned)  .  172 

48.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/  reversed,  non- 

aligned)  .  173 

49.  Number  of  iterations  vs.  normalized  error  ( Lenna ,  non-aligned) .  174 

50.  Number  of  iterations  vs.  normalized  error  (phoneme  /OY/,  aligned)  ....  175 

51.  Error  for  approximations  of  various  non-speech  and  speech  samples  ....  176 

52.  Segments  used  in  Lenna  and  /OY/  (non-aligned) .  177 

53.  Fourier  transform  of  segments  used  in  Lenna  and  /OY/  (non-aligned)  .  .  .  178 

54.  Spectrograms  of  clean  and  noisy  speech  (sal) .  180 

55.  Spectrograms  of  clean  and  noisy  speech  (sxl94) .  181 

56.  Error  for  approximations  of  clean  and  noisy  speech  (sal,  6  dB  SNR)  .  .  .  189 

57.  Error  for  approximations  of  clean  and  noisy  speech  ( sxl94 ,  6  dB  SNR)  .  .  190 

58.  Error  for  approximations  of  clean  and  noisy  speech  (sal,  varying  SNR)  .  .  191 

59.  Error  for  approximations  of  clean  and  noisy  speech  ( sxl94 ,  varying  SNR)  .  192 

60.  Spectrograms  of  approximations  of  clean  speech .  193 

61.  Spectrograms  of  approximations  of  noisy  speech .  194 

62.  Spectrograms  of  approximations  for  different  noise  levels .  195 

63.  Spectrograms  of  approximations  of  clean  speech  with  differing  window  over¬ 
laps .  196 

64.  Time  varying  glottal  excitation  shape  and  impulse  response .  205 

65.  Weighting  Function  used  in  Glottal  Pulse  Finding  Heuristic  Algorithm  .  .  240 

66.  Glottal  Pulses  Found  in  Samples  of  Clean  Speech .  242 

67.  Glottal  Pulses  Found  in  Samples  of  Noisy  Speech .  243 


x 


List  of  Tables 


Table  Page 

1.  Minimum  and  maximum  error  for  different  analysis  window  offsets  ....  145 

2.  Compression  ratios .  182 

3.  Z-2  difference  between  clean  speech  and  approximations .  197 


xi 


List  of  Symbols 


Symbol  Description 

<90  Boundary  of  D,  i.e.,  the  unit  circle  in  C 

#{•}  Cardinality  of  set  {•} 

[•]  Ceiling  function,  i.e.,  the  least  integer  upper  bound 
lx  Characteristic  function  on  the  set  X 
X  Closure  of  the  set  X 
C  The  set  of  complex  numbers 
C+  The  subset  of  C  such  that  for  z  £  C" ,  Re(z)  >  0 
D  Open  unit  disk  in  C 

Hp( D)  The  Hardy  space  (0  <  p  <  oo)  on  D  with  norm 

ll/llr  =  sup0<r<:i  (fan  \f(,rz)\p  d<j(z))1^p 

H oo(D)  The  Hardy  space  on  D  with  norm  H/lloo  =  sup^^j  |/(z)| 

H2{G f )  The  Hilbert  space  on  C+  with  norm 

ll/llfr2(c+)  =  iimx— 0+  IZ=  \f(x  +  {y)  I2  dV 

L2(X)  The  Hilbert  space  of  Lebesgue  measurable,  2-integrable  functions 
over  X  with  norm  ||/|||2  =  f1£X \f(t)\2dt 

LP(X )  The  Banach  space  of  Lebesgue  measurable,  p-integrable  functions 

(0  <  p  <  oo )  over  X  with  norm  \\f\\pLp  =  Jt£X  \f(t)\p  dt 

Loo(X)  The  Banach  space  of  Lebesgue  measurable,  essentially  bounded 
functions  over  X  with  norm  ||/||ioo  =  esssupt€X  \  f(t)\ 

K(a,r)  Pseudo-hyperbolic  ball  in  D  about  point  a  of  radius  r 
p  Pseudo-hyperbolic  metric  on  D 

E  The  set  of  real  numbers 

E+  The  set  of  positive  real  numbers 

A  Uniform  convergence 

Z  The  set  of  integer  numbers 

Z+  The  set  of  positive  integer  numbers,  a.k.a.,  the  natural  numbers 


xii 


Representations,  Approximations,  and  Algorithms  for 
Mathematical  Speech  Processing 


I.  Introduction 

The  motivation  driving  this  work  was  the  desire  for  a  mathematical  construct  which 
is  suitable  for  the  representation  of  human  speech.  This  goal  resulted  in  mathematical  work 
being  done  in  three  different  areas  of  mathematics.  Since  the  connection  among  these  areas 
is  not  obvious,  being  defined  by  the  desired  representation  for  speech,  an  attempt  will  be 
made  to  provide  a  clear  road  map  in  this  introduction. 

The  work  presented  in  this  document  touches  on  three  main  areas,  and  culminates 
with  a  mathematical  frame  for  L2(K)  which  is  tailored  to  the  representation  of  speech. 
The  areas  of  work  are  representation  of  the  Hardy  spaces,  Hp( D),  for  p  >  1,  mathematical 
frames  for  X2(R),  and  mathematical  frames  for  the  Hardy  space  H2(B>). 

Since  the  concept  of  a  mathematical  frame,  which  is  relatively  new,  is  referred  to 
frequently  in  this  introduction,  a  brief  description  of  a  frame  and  why  they  are  useful  is 
appropriate  here.  A  mathematical  frame  in  a  separable  Hilbert  space  is  a  special  kind  of  set 
of  vectors  which  may  be  non-orthogonal  and  may  be  over-complete,  i.e.,  linearly  dependent. 
It  is  a  spanning  set  in  that  any  element  of  the  Hilbert  space  may  be  represented  as  a  linear 
combination  of  elements  of  the  frame.  While  a  frame  may  be  over-complete,  it  cannot  be 
too  much  so.  A  more  rigorous  definition  of  the  frame  and  its  associated  frame  operator 
are  given  in  Definition  4.1.1. 

I  felt  that  a  basic  building  block  was  needed  for  the  desired  speech  representation. 
This  building  block  should  capture  some  of  the  basic  characteristics  of  human  speech  over 
short  periods  of  time.  In  addition,  these  building  blocks  should  be  compatible  with  the  real 
world  constraints  imposed  by  the  way  speech  is  recorded  for  speech  processing  purposes. 
Such  building  blocks  were  found  as  the  result  of  an  extension  to  a  representation  theorem 
for  the  Hardy  spaces,  These  building  blocks  can  be  used  via  a  frame  for  H2 (D) 

based  on  them.  By  way  of  the  Laplace  transform,  such  a  frame  becomes  a  frame  for 
L2(R+).  As  a  result,  a  frame  is  developed  which  is  tailored  to  the  representation  of  short 
segments  of  recorded  speech. 


1 


To  represent  longer  segments  of  speech,  the  short-duration  building  blocks  needed  to 
be  combined  in  some  appropriate  way.  Independent  of  the  HP(D)  and  Z2(R+)  representa¬ 
tions  above,  a  frame  for  Z2(R)  is  developed  based  on  a  frame  or  set  of  frames  for  X2(R+). 
This  construct  has  properties  which  make  it  suitable  for  effective  speech  processing. 

Combining  these  two,  one  has  a  frame  which  can  well  represent  the  slowly  time- 
varying,  harmonic  structure  of  speech,  while  still  being  able  to  represent  the  non-harmonic 
components  of  speech. 

Additional  results  were  found  which  aid  in  the  numerical  processing  involved  in  find¬ 
ing  frame  representations  of  specific  examples  of  recorded  speech.  An  alternate  representa¬ 
tion  of  the  frame  operator  is  given  which  lends  itself  to  more  stable  and  efficient  computa¬ 
tion  of  frame  representations.  By  viewing  the  operator  as  the  sum  of  weighted  projections 
onto  subspaces,  it  becomes  possible  to  recursively  nest  frames,  allowing  for  more  tractable 
inverse  frame  operator  computations.  This  representation  is  especially  useful  in  working 
with  the  above-mentioned  speech  frame  for  lengthy  segments  of  speech. 

To  summarize,  basic  building  blocks  for  speech  representation  are  found  via  repre¬ 
sentations  in  ifp(D).  These  building  blocks  are  combined  into  a  construct  useful  for  speech 
representation  using  frames  for  X2( R+)  and  Z2(R).  The  resulting  construct  -  a  frame  suit¬ 
able  for  speech  representation  -  is  made  more  practical  by  an  alternate  representation  of 
frame  operators  which  yields  numerically  more  tractable  frame  calculations. 

Chapter  II  discusses  some  useful  background  on  speech  and  hearing  -  useful  in  that 
there  are  physiological  considerations  when  developing  speech  representations  for  auditory 
use.  Chapter  III  presents  an  extension  to  an  HP(D)  representation  theorem.  Chapter  IV 
presents  a  frame  for  i2(R)  as  well  as  the  alternate  representation  of  the  frame  operator. 
The  frame  for  //2(0)  is  presented  in  Chapter  V.  These  results  are  combined  in  Chapter  VI 
to  yield  a  frame  for  speech  processing.  The  results  of  experiments  done  with  a  computer 
implementation  based  on  this  construct  are  presented  in  Chapter  VII. 


2 


II.  Background 


In  order  to  work  well  with  speech  signals,  one  must  have  some  understanding  of 
how  speech  is  created  and  perceived.  This  is  particularly  pertinent  here  since  many  of 
the  “obvious”  approaches  to  working  with  signals  can  lead  to  perceptually  unacceptable 
results  when  applied  to  speech. 

A  review  of  the  speech  production  process  will  now  be  given.  Following  that,  some 
of  the  more  successful  speech  production  models  will  be  reviewed.  In  addition,  the  human 
auditory  system  and  its  characteristics  will  be  discussed  along  with  some  of  the  simple  au¬ 
ditory  models.  Finally,  existing  distortion  measures  and  their  performances  are  discussed. 

2. 1  Speech 

In  order  to  represent  speech  well,  one  must  have  some  understanding  of  how  it  is 
produced.  This  section  describes  the  speech  production  process,  the  characteristics  that 
this  process  imposes  on  speech  and  some  of  the  simple,  yet  effective,  speech  production 
models. 

Before  beginning  these  discussions,  it  will  be  helpful  to  describe  a  very  useful  tool 
in  speech  analysis,  the  spectrogram.  The  spectrogram  is  a  three-dimensional  plot  of  time 
vs.  frequency  with  the  third  dimension  being  the  spectrum  value  at  that  time  and  fre¬ 
quency.  The  spectrum  is  the  magnitude  of  the  windowed  Fourier  transform.  In  a  spec¬ 
trogram,  the  magnitude  is  usually  represented  with  color  or  gray-scale  intensity,  yielding 
two-dimensional  pictures  that  are  meaningful  and  easily  interpreted.  The  primary  reason 
for  the  usefulness  of  the  spectrogram  in  speech  analysis  is  that  it,  in  a  very  real  sense, 
mimics  the  frequency  analysis  which  is  done  in  the  ear,  which  will  be  discussed  in  more 
detail  later. 

2.1.1  The  speech  production  process  [23].  The  physical  speech  production  process 
is  quite  complicated  and  involves  the  entire  respiratory  system.  However,  speech  can,  quite 
simply,  be  thought  of  as  the  output  produced  when  a  stream  of  air  produced  by  the  lungs 
is  modified  by  the  various  parts  of  the  vocal  tract.  Both  the  shaping  of  the  vocal  tract  and 
the  characteristics  of  the  source  air  stream  can  be  independently  changed  (to  some  extent). 
These  changes  modify  the  characteristics  of  the  air  stream,  which  carry  the  information 
that  is  contained  in  “speech.” 

The  whole  vocal  tract  plays  a  part  in  the  modifications  to  the  air  stream,  but  there 
are  two  main  classes  of  modification.  The  first  is  excitation ,  the  process  by  which  frequency 


3 


Amplitude 


Amplitude 


Figure  1.  Simplified  glottal  pulses,  (a)  Amplitude  vs.  Time,  (b)  Spectrum  Amplitude 
vs.  Frequency 

content  is  added.  The  second  class  of  modification  is  caused  by  linear  filtering  (sometimes 
referred  to  as  modulation  [23]),  by  which  the  shape  of  the  vocal  tract  induces  changes  in 
this  frequency  content.  The  nature  of  these  changes  as  well  as  the  nature  of  the  underlying 
frequencies  are  what  convey  the  information  content  in  speech  to  the  listener. 

2. 1.1.1  Excitation  sources.  The  most  important  excitation  source  is  the 
vocal  cords.  There  are  two  main  modes  of  excitation  produced  by  the  vocal  cords.  In  the 
first,  called  phonation ,  the  vocal  cords  repeatedly  open  and  close.  This  interruption  of 
the  airflow  produces  air  pulses,  called  glottal  pulses.  Usually,  these  pulses  occur  at  regular 
intervals,  but  can  become  irregular,  especially  during  unstressed  portions  of  speech  and 
near  the  ends  of  sentences.  The  shape  of  the  glottal  pulses  varies  greatly,  from  speaker  to 
speaker,  with  pitch  (fundamental  voice  frequency),  etc.  [23:115].  Because  glottal  pulses  are 
usually  regularly  spaced,  they  add  mainly  harmonic  frequency  content  to  speech.  Speech 
in  which  phonation  takes  place  is  called  voiced ,  and  that  without,  unvoiced.  Figure  1  shows 
some  simplified  glottal  pulses  and  the  magnitude  of  their  Fourier  transform. 

The  second  main  mode  of  excitation  from  the  vocal  cords  is  called  whispering.  In  this 
mode,  the  vocal  cords  are  contracted,  but  not  closed,  so  that  only  a  small  opening  remains 
for  the  air  to  go  through.  This  produces  turbulence  which  causes  wideband  noise.  This 
noise  is  the  basis  for  whispered  speech  and  for  normal  unvoiced  speech.  Figure  2  shows  a 
simplification  of  such  whispered  excitation. 

Phonation  and  whispering  form  the  basis  for  most  speech  sounds.  However,  there 
are  other  sources  of  excitation  in  the  vocal  tract  also.  Three  important  ones  are  called 
frication ,  compression ,  and  vibration ,  and  are  mainly  used  to  form  different  consonants. 


4 


Amplitude 


Amplitude 


5  10  15  20  25  30 


Time  (msec) 


Figure  2.  Whispered  excitation,  (a)  Amplitude  vs.  Time,  (b)  Spectrum  Amplitude  vs. 

Frequency. 

Frication  is  caused  when  the  vocal  tract  is  tightly  constricted  at  some  point  other 
than  at  the  vocal  cords  (e.g.,  tongue/roof-of-mouth,  lips,  etc.)  The  air  flow  past  the 
constriction  causes  turbulence,  which  causes  broadband  noise  (fricative  sounds). 

Compression  is  caused  when  the  vocal  tract  is  completely  closed  at  some  point  other 
than  at  the  vocal  cords.  The  release  of  the  pressure  can  be  abrupt  (plosive  sounds)  or  can 
blend  together  with  a  following  fricative  (affricative  sounds). 

Vibration  is  caused  by  air  being  forced  through  a  closure  other  than  at  the  vocal 
cords.  These  vibrations  add  additional  frequency  content  to  the  sound. 

2. 1.1. 2  Frequency  Modification.  The  frequency  content  of  the  sound  is 
further  modified  by  the  shape  of  the  vocal  tract  itself.  The  vocal  tract  transmits  certain 
frequencies  more  efficiently  than  others,  depending  upon  its  shape.  Relative  peaks  in 
frequency  transmission  are  called  formants .  As  the  shape  of  the  vocal  tract  is  changed,  the 
amount  of  each  frequency  which  is  passed  changes  also,  which  changes  the  sound  of  the 
speech.  Figure  3  shows  an  example  of  a  formant  structure  (shown  in  the  Fourier  domain) 
and  the  effect  it  has  on  the  frequency  content  of  a  stream  of  glottal  pulses. 

This  formant  information  is  of  primary  importance  in  the  perception  of  vowels,  and 
vowels  are  conveyed  almost  solely  in  the  formant  information.  However,  it  is  also  an 
important  clue  in  the  identity  of  consonants,  because  the  consonant  adjacent  to  a  vowel 
affects  the  formant  structure  “at  the  edges”  of  the  vowel  [21].  This  inter-modification  of 
adjacent  sounds  is  called  co-articulation.  So,  although  most  languages  transfer  most  of 
their  information  content  in  the  consonants,  the  adjacent  (linguistically  less  important  [23]) 
vowels  convey  important  clues  about  the  identity  of  the  consonants  [23]. 


5 


Amplitude 


(a) 


Amplitude 


(c) 


4096 


8192 


Frequency  (Hz) 


5  10  15  20  25  30 


Time  (msec) 


Amplitude 


(b) 


4096 


- Frequency  (Hz) 

8192 


Amplitude 


<d) 


Figure  3.  Formant  structure  and  its  effects  on  the  frequency  content  in  voiced  speech,  (a) 
Formant  structure  in  frequency  domain,  (b)  Effect  on  spectrum  of  simplified 
voiced  excitation,  (c)  Original  excitation,  (d)  Excitation  after  linear  filtering. 

While  the  speech  production  process  is  quite  complex,  some  of  the  successful  models 
of  it  are  quite  simple.  The  following  section  discusses  some  of  the  simpler  speech  production 
models  and  the  simplifying  assumptions  employed  to  create  them. 

2.1.2  Speech  production  models.  There  are  two  differing  philosophies  on  modeling 
speech  production.  One  way  is  to  model  the  actual  vocal  tract  (shape,  motions,  etc.)  and 
the  motion  of  the  air  through  it.  This  is  quite  difficult  and  computationally  expensive, 
and  until  recently,  the  computing  resources  necessary  to  do  such  work  were  not  readily 
available  [13].  However,  experience  in  speech  synthesis  has  shown  reasonably  good  results 
can  be  achieved  by  ignoring  the  actual  vocal  tract  and  the  complexities  of  the  air  flow 
through  it.  Instead,  the  glottal  excitation  can  be  simplified  to  a  one-dimensional  input 
and  the  effects  of  the  vocal  tract  simplified  to  the  effects  of  a  linear  system  on  that  input. 

The  simplest  models  take  the  form  shown  in  Figure  4.  Here,  the  excitation  source 
input  to  the  “black  box”  vocal  tract  is  the  air  flow  from  the  lungs  after  it  has  been  excited 
by  the  vocal  cords.  Often,  this  glottal  excitation  is  assumed  to  take  the  form  of  a  train  of 
pulses  (for  voiced  speech)  or  white  noise  (for  unvoiced  speech),  but  current  work  continues 
in  the  development  of  new  glottal  excitation  models  [27].  The  vocal  tract  is  modeled  as  a 
causal,  linear,  time-invariant  system,  usually  as  one  that  can  be  represented  by  an  all-pole 


6 


Glottal  Excitation 


Speech 


Vocal  Tract 


Figure  4.  Simple  speech  production  model.  Vocal  tract  modeled  as  a  linear  system  with 
an  input  of  “glottal  excitation”  and  producing  as  output,  “speech.” 

or  a  pole-zero  model.  The  output  (speech)  from  a  given  input  (excitation)  and  fixed  vocal 
tract  configuration  at  time  t  is  given  by 

s(t)  =  (  x(r)h(t  —  r)  dr 

Jo 

where  s  is  the  speech  signal,  x  is  the  excitation  source,  and  h  is  the  impulse  response  of 
the  vocal  tract.  Notice  that  h  is  time-invariant,  so  to  model  speech  (where  the  vocal  tract 
varies  with  time)  using  this  method  would  require  that  the  output  produced  by  different 
values  of  h  be  spliced  together  in  some  way. 

It  is  more  desirable  to  model  the  vocal  tract  as  a  time-varying  linear  system.  For 
such  a  system,  the  output  is  given  by 

s(t)  =  /  x(r)v(t,r)  dr 

Jo 

where  x  is  as  given  before  and  v(t,r)  is  the  system  response  at  time  t  to  an  impulse  at 
time  r. 

This  type  of  model  can  quite  easily  model  vowels  and  other  vocalic  sounds.  In  its 
simplest  form,  it  is  widely  used  for  speech  analysis  and  synthesis  [23],  although  it  does 
not  provide  for  the  cases  where  excitation  occurs  at  additional  points  other  than  the  vocal 
cords  (e.g.,  fricatives),  or  where  the  air  flow  characteristics  are  significantly  changed  (e.g., 
plosives).  An  enhanced  model  is  necessary  to  provide  a  more  natural  representation  of 
these,  but  this  is  seldom  done  in  practice,  since  good  results  have  been  achieved  with  the 
simpler  model  [23]. 


7 


2.2  Human  Hearing  and  Speech  Perception 

In  order  to  develop  an  appropriate  representation  of  speech,  it  is  necessary  to  study 
some  of  the  details  of  how  the  ear  and  brain  process  sound.  This  section  discusses  some  of 
the  information  pertinent  to  the  understanding  of  the  human  auditory  system. 

2.2.1  The  physical  auditory  system  [21,  23,  32,  28].  Sound  (compression  waves 
in  air)  enters  the  auditory  system  through  the  meatus  (auditory  canal).  It  is  modified  en 
route  by  the  effects  of  the  pinna  (outer  ear).  These  effects  consist  mainly  of  frequency 
filtering  and  amplitude  reduction.  The  sound  waves  cause  the  tympanic  membrane  (ear 
drum)  to  vibrate.  The  ear  drum  is  connected  to  the  cochlea  at  the  oval  window  through  a 
linkage  of  three  bones.  These  bones  act  as  an  impedance  matching  device  which  matches 
the  acoustical  impedance  between  air  (outside  the  cochlea)  and  cochlear  fluid. 

The  cochlea  is  a  fluid-filled,  spiral-shaped  structure  completely  enclosed  in  bone 
except  for  two  membrane-covered  openings,  the  oval  and  round  windows.  It  is  divided  along 
almost  all  its  length  by  two  membranes,  the  basilar  membrane  and  Reissner’s  membrane. 
These  divide  the  cochlea  into  three  passages,  the  scala  vestibuli,  the  scala  media,  and  the 
scala  tympani.  The  outer  two  of  these  three,  the  scala  vestibuli  and  the  scala  tympani, 
are  connected  at  the  far  end  (farthest  from  the  oval  and  round  windows)  at  an  opening 
called  the  helicotrema.  The  oval  window  interfaces  with  the  scala  vestibuli  and  the  round 
window  with  the  scala  tympani. 

Since  the  cochlea  is  enclosed  in  rigid  bone  except  for  the  membrane  covered  oval 
and  round  windows,  and  is  filled  with  an  (almost)  incompressible  fluid,  any  vibration 
of  the  oval  window  must  cause  a  corresponding  vibration  in  the  round  window.  That 
is,  when  the  oval  window  is  deflected  inward,  the  round  window  is  almost  immediately 
deflected  outward.  Since  the  scala  vestibuli  and  scala  tympani  are  connected  at  the  far 
end,  vibrations  of  the  oval  window  cause  a  corresponding  vibrating  displacement  along  the 
basilar  membrane.  Due  to  the  nature  of  the  auditory  system,  experiments  to  determine  this 
displacement  are  quite  difficult  to  conduct.  However,  the  results  of  such  experiments  have 
shown  that  for  a  constant  frequency  tone  (i.e.,  sinusoidal  wave),  the  pattern  of  displacement 
is  easy  to  categorize.  The  amplitude  of  displacement  gradually  increases  to  a  maximum 
at  a  frequency  .dependent  distance  along  the  basilar  membrane,  and  then  drops  off  more 
rapidly.  The  lower  the  frequency,  the  farther  along  the  basilar  membrane  from  the  oval 
window  this  maximum  occurs.  Thus  the  frequency  content  of  sound  is  distributed  along 
the  basilar  membrane,  in  a  manner  similar  to  a  spectrogram.  The  relationship  between 


8 


displacement  along  the  basilar  membrane  and  frequency  is  roughly  logarithmic  above  800 
Hz,  and  becomes  more  linear  at  lower  frequencies. 

Along  the  length  of  the  basilar  membrane,  within  the  scala  media,  is  a  structure 
called  the  organ  of  Corti.  The  tectorial  membrane  also  runs  the  length  of  the  basilar 
membrane  and  lies  above  the  organ  of  Corti.  Hair  cells  projecting  from  the  organ  of  Corti 
touch  the  tectorial  membrane  and  shear  between  the  tectorial  and  the  basilar  membrane 
cause  the  hair  cells  to  move.  This  motion  triggers  nerve  firings,  the  rates  of  which  are 
somewhat  (but  not  entirely)  related  to  the  amplitude  of  the  vibrations. 

Early  experimenters  working  with  cadavers  came  to  the  conclusion  that  the  frequency 
selectivity  of  any  given  location  on  the  basilar  membrane  was  very  low  (i.e.,  any  given 
location  responded  well  to  a  broad  range  of  frequencies);  so  theorists  had  trouble  accounting 
for  the  rather  good  frequency  resolution  that  is  exhibited  by  most  humans.  More  recent 
work  with  live  animal  subjects  has  revealed  that  the  cochlea  is  not  entirely  a  passive 
“spectrum  analyzer.”  It  has  been  found  that  the  auditory  system  actively  tunes  itself, 
so  that  in  a  living  subject  in  good  physiological  condition,  the  frequency  selectivity  of  a 
given  location  on  the  basilar  membrane  is  much  better  than  for  one  in  poor  physiological 
condition  [21].  This  active  tuning  mechanism,  or  at  least  its  effects,  must  be  accounted 
for  in  good  models  of  the  auditory  system. 

Since  each  location  along  the  basilar  membrane  responds  best  to  a  different  frequency, 
the  location  of  maximum  vibration  is  believed  to  be  an  important  source  of  frequency 
information  to  the  brain  (the  place  theory).  However,  due  to  a  phenomena  called  phase 
locking ,  which,  for  frequencies  less  than  5kHz,  causes  the  nerve  cells  to  fire  at  approximately 
the  same  phase  of  a  sound  wave  every  time,  temporal  information  on  the  frequency  is 
available  also.  That  is,  for  given  frequency  /,  all  nerve  cells  corresponding  to  a  frequency 
in  a  small  neighborhood  of  /  fire  at  the  frequency  /.  The  theory  that  this  information  is 
the  primary  source  of  frequency  discrimination  is  called  the  temporal  theory .  Some  research 
indicates  that  both  place  and  temporal  information  is  used  [21,  23,  31]. 

2.2.2  Simple  auditory  model.  Many  of  the  most  useful  auditory  models  model 
the  displacement  of  the  basilar  membrane  generated  by  a  sound  as  the  output  of  an  array 
of  filter  banks.  Due  to  the  causal  nature  of  hearing  and  the  finite  time  domain  response 
possible  in  the  auditory  system,  a  type  of  filter  chosen  is  a  causal  finite  impulse  response 
(FIR)  filter.  Much  research  has  been  done  to  determine  the  shapes  of  actual  “auditory 
filters”  at  different  locations  along  the  basilar  membrane  [21]. 


9 


Given  the  filter  bank  model  of  the  cochlea,  the  displacement  of  the  basilar  membrane 
due  to  an  input  sound  at  a  distance  s  from  the  cochlea  base  is  given  by 

Vs(t)  =  (hs  *  x)(t) , 

where  ys  is  the  displacement  as  a  function  of  time,  t ;  x  is  the  input  to  the  ear  as  a  function 
of  time;  hs  is  the  impulse  response  corresponding  to  location  s  on  the  basilar  membrane; 
and  the  asterisk  is  used  to  denote  convolution. 

The  active  tuning  mechanism  of  the  ear  must  also  be  modeled  before  nerve  firing 
rates  can  be  estimated.  Although  the  actual  mechanism  by  which  the  ear  does  this  is  not 
yet  certain,  it  is  sometimes  modeled  as  a  lateral  inhibition  network  (LIN).  In  a  LIN,  strong 
activity  in  one  neuron  can  inhibit  weaker  activity  in  adjacent  neurons,  causing  the  activity 
of  these  neurons  to  decrease  even  more. 

2.3  Distortion  Measures 

Distortion  and  quality  assessment  measures  are  of  great  interest  for  many  commercial 
applications.  For  this  reason,  much  work  has  been  put  into  developing  and  evaluating  such 
measures,  since  accurate  measures  aid  in  the  development  of  high  quality  transmission 
techniques. 

Although  the  only  generally  accepted,  general-purpose  (i.e.,  can  be  used  for  any 
transmission  technique)  quality  assessment  measure  is  to  have  the  system  evaluated  by  a 
panel  of  trained  listeners,  this  type  of  subjective  measure  is  expensive  and  slow.  Therefore, 
accurate  objective  measures,  which  can  be  automated,  are  being  sought. 

Below,  different  classes  of  objective  quality  measures  are  described  and  their  perfor¬ 
mance  reviewed.  This  information  is  used  to  help  justify  the  structure  of  speech  space  in 
the  following  chapter. 

2.3.1  Descriptions  of  objective  speech  quality  measures .  Objective  speech  quality 
measures  in  use  today  may  be  placed  in  four  categories.  The  first  category  consists  of 
measures  that  compare  the  actual  waveforms  of  the  original  and  distorted  speech.  This 
kind  of  measure  is  of  use  only  when  the  distortion  does  not  change  the  basic  shape  of  the 
waveform.  The  second  category  consists  of  measures  that  compare  parameters  based  on 
production  models  of  speech.  These  can  be  applied  to  a  wider  range  of  distortion  types,  but 
may  not  work  well  on  distortions  that  cannot  be  reproduced  using  the  production  model 


10 


basis  of  the  measure.  The  next  category  consists  of  measures  based  on  auditory  models. 
These  work  well  on  a  wide  range  of  distortions  but  are  computationally  expensive.  The 
last  category  consists  of  measures  that  combine  multiple  measures  into  one.  The  advantage 
here  is  that  it  is  possible  to  combine  different  types  of  measures,  that  individually  may  not 
perform  well,  to  yield  superior  performance. 

A  common  enhancement  of  the  above  categories  is  to  separate  the  speech  into  fre¬ 
quency  bands  and  to  perform  the  analysis  on  each  frequency  band  separately.  The  results 
of  each  frequency  band  analysis  can  then  be  weighted  independently.  Additionally,  all 
of  these  techniques  can  be  further  tailored  by  pre-classifying  sounds  (into,  e.g.,  fricative , 
vocalic ,  and  nasal )  and  using  specific  measures  that  perform  well  with  the  appropriate 
class  [25]. 

An  introduction  to  objective  speech  quality  measures  can  be  found  in  [25],  where 
many  of  the  older  and  recent  measures  (up  through  1985)  are  described  along  with  results 
of  their  extensive  testing.  Except  where  otherwise  indicated,  the  descriptions  below  are 
based  on  those  presented  in  [25],  although  the  categorization  is  mine. 

2. 3. 1.1  Waveform  comparison  measures.  Waveform  comparison  techniques 
compare  the  waveforms  of  the  original  and  distorted  speech  directly  and  attempt  to  assess 
the  importance  of  the  differences  between  them.  Hence,  this  sort  of  objective  measure  is 
suitable  for  use  only  with  distortions  for  which  the  waveform  itself  is  well  preserved.  That 
is,  if  the  waveform  is  not  well  preserved  or  is  simply  shifted  in  time,  these  measures  may 
indicate  large  differences  in  perceptually  identical  speech. 

One  of  the  simplest  objective  quality  measures  is  the  l2  norm,  given  by 

\\x<p  -  Xd\\2  =  ~  Xdin)\2  > 

n 

where  x $  and  xd  represent  the  sampled  original  and  distorted  signal,  respectively.  However, 
it  yields  quite  poor  results  when  applied  to  speech  problems,  and  thus  is  never  used. 
Specifically,  the  energy  difference  and  perceptual  difference  are  not  correlated  across  a 
wide  range  of  distortions. 

Much  more  useful  are  variants  of  the  signal-to-noise  ratio  (SNR).  SNR  measures 
compare  the  energy  of  the  signal  with  the  energy  of  the  noise  (defined  as  the  difference 


11 


between  the  original  and  distorted  signal).  While  the  classical  SNR,  defined  by 


SNR 


101og10 


T.nlx^n)  -  xd(n)\2 


where  x $  is  the  original  speech  signal  and  Xd  is  the  distorted  speech,  has  been  found  to 
be  of  little  use  for  speech  quality,  some  segmental  variants  of  the  SNR  are  quite  good. 
Particularly,  the  Frequency  Weighted  segmental  SNR,  where  the  SNR  is  calculated  for 
different  time  segments  and  frequency  bands,  can  be  constructed  to  correlate  quite  well 
with  perceptual  quality.  These  variants  are  generally  of  the  form 


FW  SNR 


10  /  E,-H'(m,j)101og10 

M  [  XiW(m,j) 


where  M  is  the  number  of  segments,  m  is  the  segment  index,  j  is  the  frequency  band  index, 
W{m,j)  is  a  weight  for  segment  m  and  frequency  band  j,  and  and  a l  m,j  are  the 

variances  for  band  j  and  segment  m  of  the  original  speech  and  noise,  respectively. 


2. 3. 1.2  Production  model  based  comparison  measures.  Production  model 
based  techniques  are  generally  based  on  the  linear  prediction  coding  (LPC)  coefficients 
of  the  original  and  distorted  speech.  The  importance  of  these  differences  is  analyzed  in 
various  ways. 

In  the  so-called  LPC  Parameter  Measures,  the  speech  is  segmented  in  time,  and 
parameters  are  calculated  for  the  original  and  distorted  speech  based  on  the  LPC  coeffi¬ 
cients  calculated  for  each  segment.  These  parameters  are  then  compared  for  each  segment 
according  to 

/  1  N 

d(Q,p,m)  =  f  jy  J2  \Q(kim>4>)- Q(k,m,S)\p 

where  d(Q,p,  m)  is  the  distance  for  segment  m  using  parameters  defined  by  Q  and  power  p 
for  1  <  p  <  oo,  N  is  the  number  of  parameters,  and  Q(fc,  m,  </>)  and  Q(k ,  m,  6)  are  the  Arth 
parameters  for  segment  m  of  the  original  and  distorted  speech,  respectively.  The  distances 
for  each  segment  are  combined  according  to 

W(m)  ’  ^  > 


12 


where  D(p)  is  the  overall  measure,  M  is  the  number  of  segment,  and  W(m)  is  the  weight  for 
segment  m.  Different  measures  can  be  easily  developed  by  producing  different  parameter 
definitions. 

The  log- likelihood  ratio  measures  use  the  LPC  coefficients  also,  but  in  a  different  way. 
Here,  the  assumption  is  that  the  speech  and  distorted  speech  can  both  be  represented  well 
by  the  all-pole  vocal  tract  transfer  function  model  over  short  segments.  The  distance 
measure  for  segment  m  is  given  by 

d(aa,at,m)  =  log  (||^± 

where  ad  and  a $  represent  the  LPC  coefficients  for  the  distorted  and  original  speech,  respec¬ 
tively,  represents  the  autocorrelation  matrix  of  the  original  speech,  and  the  superscript 
T  denotes  the  transposition  operation.  It  can  be  shown  that  the  resulting  value  is  never 
negative.  These  values  are  combined  into  an  overall  measure  in  a  similar  way  to  that  done 
above  in  (1). 


2. 3. 1.3  Auditory  model-based  comparison  measures.  The  basis  of  auditory 
based  comparison  measures  is  generally  the  short-time  spectrum  (magnitude  windowed 
Fourier  spectrum)  of  the  speech.  The  attitude  of  those  in  the  speech  quality  field  is  best 
summarized  by  the  following  quote: 


It  is  widely  felt  that  distortions  in  the  envelope  of  the  short-time  speech  mag¬ 
nitude  spectrum  are  the  main  determinants  of  speech  quality.  [25:pg  36] 

This  belief  is  fostered  by  the  successes  enjoyed  using  these  techniques,  which  continues  to 
fuel  work  in  this  area. 

All  of  the  methods  discussed  here  use  the  magnitudes  of  the  short-time  spectrum  to 
compute  the  measure.  The  reason  that  the  short-time  spectrum  is  considered  an  auditory 
model  is  that  for  many  years  (beginning  in  1843  with  Ohm’s  acoustic  law  [31])  the  basilar 
membrane  was  considered  to  be  performing  something  equivalent  to  a  frequency  analysis 
on  incoming  sound,  which  was  considered  to  be  well  represented  by  a  short-time  Fourier 
spectrum.  These  measures  are  generally  of  the  form 


d(p,  m) 


TLx  W[V^m,  k),  Vjjn,  k),  fc]|F(m,  k)\” 

TZ=iW\y+(m,k),VAm,k),k] 


i  Ip 


13 


where  d(p,  m)  is  the  distance  for  segment  m  using  power  p  for  0  <  p  <  oo,  N  is  the  number 
of  frequency  bands,  V^(m,  k)  and  Vd(m,  k)  are  the  magnitude  spectra  for  band  k  of  the 
original  and  distorted  speech,  respectively,  W  is  a  weighting  factor,  and  F  is  some  measure 
of  spectral  difference.  The  measure  F  can  take  on  many  forms,  linear  or  non-linear.  As 
before,  the  segment  differences  can  be  combined  according  to  (1)  above. 

A  distinctly  different  and  slightly  more  sophisticated  method  is  the  Weighted  Slope 
Spectral  Distance  measure.  In  this  technique,  the  rates  of  change  (slopes)  of  different 
magnitude  spectral  bands  are  used  in  addition  to  the  values  of  the  bands  themselves,  to 
put  more  emphasis  on  the  location  of  the  spectral  peaks  than  on  their  heights. 

2. 3 A. 4  Composite  comparison  measures .  Since  objective  measures  perform 
with  different  degrees  of  accuracy  on  different  types  of  distortions,  it  is  logical  to  try  to 
combine  different  measures  together  in  hopes  of  seeing  an  overall  improvement.  Using  a 
regression  model  describable  as 


k 

Hi  =  bo  +  YjbjXij 

3  = 1 

where  x is  an  objective  variable,  bj  is  a  regression  coefficient,  and  &  is  the  estimate 
of  composite  acceptability.  Quackenbush  et  al  [25]  tried  various  combinations  of  the 
previously  described  models  as  well  as  combinations  of  parametric  models.  They  tuned 
the  parameters  for  best  performance  on  their  database  to  provide  upper  bounds  on  model 
performance. 

In  [14],  Hayashi  and  Kitawaki  noted  that  for  high  quality  speech,  preferences  are 
based  on  how  easily  noise  that  is  just  above  the  detection  threshold  can  be  detected  in  a 
given  frequency  band.  They  propose  to  measure  this  by  linearly  combining  three  different 
measures,  the  segmental  SNR,  the  COSH  (a  measure  of  how  closely  the  spectrum  of  the 
noise  resembles  the  shape  of  the  spectrum  of  the  speech),  and  the  similarity  in  the  power 
envelopes  of  the  source  and  noise  in  the  time  domain.  They  also  use  multiple  regression 
analysis  to  optimize  their  performance. 

2.3.2  Evaluations  of  objective  speech  quality  measures.  Over  a  10  year  period 
(1975-1985),  Quackenbush  et  al  [25]  performed  detailed  analyses  of  many  types  of  objective 
measures  using  a  carefully  constructed  database.  The  basis  against  which  they  judged  the 
objective  measures  was  the  results  of  a  carefully  conducted  subjective  listening  test  against 
the  same  database. 


14 


They  examined  standard  measures  in  the  field  as  well  as  newer  ones  and  also  devel¬ 
oped  some  composite  measures  which  combine  various  simpler  measures,  optimized  for  use 
with  their  extensive  database.  For  each  technique,  they  attempted  to  optimize  performance 
on  their  database,  in  order  to  provide  information  on  the  upper  bounds  of  performance. 
The  figures  of  merit  used  to  determine  performance  were  an  estimate  of  the  correlation 
coefficient  between  the  subjective  and  objective  measures  and  an  estimate  of  the  variance 
of  the  error.  The  authors  note  that  the  computed  values  are  of  use  only  when  comparing 
results  from  the  same  database. 

Of  the  non-composite  measures,  the  “best”  results  were  obtained  by  a  variant  of  the 
frequency  variant  segmental  SNR.  However,  this  technique  is  applicable  only  to  a  limited 
class  of  distortions,  and  so  is  of  little  interest  in  this  study.  Many  of  the  spectral  distance 
based  measures  did  quite  well,  as  did  the  LPC-based  log-area  ratio. 

Of  the  composite  measures,  Quackenbush  and  his  coworkers  were  able  to  design 
measures  that  did  better  than  any  of  the  simpler  measures,  with  the  exception  of  the 
limited-usefulness  waveform  comparison  measures.  Better  results  were  achieved  by  com¬ 
bining  dissimilar  measures  rather  than  similar  ones.  Despite  this  success,  it  must  be  noted 
that  these  composite  measures  were  optimized  on  their  specific  database  and  may  not 
perform  as  well  on  another. 

For  Hayashi  and  Kitawaki’s  measure,  they  achieved  excellent  results  when  using  all 
three  measures,  but  much  poorer  results  when  using  only  one  or  two.  This  supports  the 
idea  that  combining  dissimilar  methods  can  yield  superior  results  [14]. 

2.4  Important  characteristics  of  speech 

In  this  section,  the  characteristics  of  speech  that  should  be  well  represented  and 
preserved  in  a  “speech  space”  are  stated.  Information  from  the  previous  sections  will  be 
summarized  and  some  new  information  will  be  introduced. 

As  seen  from  the  success  of  the  spectral-based  distortion  measures,  the  magnitude 
spectrum  is  an  important  feature  of  speech.  Additional  support  for  this  view  comes  from  [3, 
29,  30],  where  the  intelligibility  degradation  due  to  differing  amounts  of  spectral  smearing 
is  studied. 

Magnitude  spectrum  alone  is  not  the  sole  determinant  of  speech  quality,  however; 
phase  information  is  important,  too.  Leek  and  Summers  [19]  found  that  in  subjects  with 
normal  hearing,  spectral  discrimination  was  slightly  better  with  “peaked”  waveforms  (in 


15 


which  the  phase  information  resembles  that  of  voiced  speech)  than  with  waveforms  pro¬ 
duced  by  other  phase  conditions.  This  lends  support  to  the  idea  that  high  quality  speech 
processing  techniques  should  produce  waveforms  in  which  the  phase  as  well  as  the  magni¬ 
tude  spectrum  resembles  that  of  natural  speech. 

Likewise,  the  effective  glottal  excitation  shape  is  important.  To  produce  natural 
sounding  speech,  it  is  necessary  to  model  voiced  excitation  as  more  that  just  a  pulse 
train  [6,  18]. 

Voice  pitch  is  clearly  important.  Likewise,  energy  envelope  (up  to  a  multiplicative 
constant)  has  been  shown  to  be  of  use  in  distortion  measures  [14]. 

However,  preserving  the  exact  waveform  is  not  of  great  importance.  If  the  waveform 
is  perfectly  preserved,  then  there  is  no  perceptual  difference,  but  the  converse  is  not  true. 

To  summarize,  for  this  study,  the  important  characteristics  of  speech  are: 

1.  magnitude  spectrum, 

2.  waveform  characteristics  (i.e.,  “peaked”  waveforms  for  voiced  speech), 

3.  glottal  excitation  shape, 

4.  voice  pitch, 

5.  energy  envelope  shape. 

Each  of  these  characteristics  can  be  quantified  individually  by  representing  them  as 
parts  of  a  simple  speech  production  model.  In  “speech”  produced  from  a  suitable  produc¬ 
tion  model  with  representative  input,  each  of  these  characteristics  can  be  well  preserved. 

A  space  based  on  the  idea  of  explicitly  representing  each  of  these  key  characteristics 
is  presented  in  Appendix  A.  The  main  thrust  of  this  dissertation  diverged  from  the  initial 
idea  of  working  with  such  a  space,  and  so  the  development  of  this  space,  while  interesting,  is 
no  longer  relevant  to  the  main  text.  Instead,  the  path  taken  in  this  dissertation  preserves 
these  characteristics  implicitly  within  the  representation.  That  is,  while  the  important 
characteristics  are  preserved,  they  cannot  be  readily  determined  from  the  form  of  the  rep¬ 
resentations  developed  here.  Because  of  this,  the  preservation  of  these  key  characteristics 
can  only  be  shown  through  listening  tests. 


16 


Ill .  Representation  of  elements  of  the  Hardy  spaces,  Hp 

Work  presented  in  this  chapter  shows  the  existence  of  a  class  of  representations  of 
elements  of  a  Hardy  space,  Hp( D),  where  D  is  the  open  unit  disk  in  C.  What  will  be 
presented  here  is  a  proof  of  an  extension  of  a  theorem  by  Luecking  [20]  dealing  with 
Carleson  inequalities  in  the  Hardy  spaces  HP(D ),  which  is  the  main  result  of  this  chapter. 
This  theorem  establishes  forward  and  reverse  Carleson  inequalities  where  the  sample  points 
used  in  these  inequalities  are  chosen  from  appropriate  sets  of  a  more  general  nature  than 
those  used  in  Luecking’s  theorem. 

An  additional  result  is  an  extension  to  a  second  theorem  by  Luecking  which  uses 
the  Carleson  inequalities  to  establish  representations  for  elements  of  the  Hardy  spaces. 
This  second  extension,  while  trivial  to  prove  given  the  work  in  [33],  is  the  one  that  will 
be  of  primary  importance  in  establishing  a  frame  for  jEf2(D),  which  is  done  in  Chapter  V. 
This  frame  for  H2(D)  is  a  key  component  in  the  frame  designed  in  Chapter  VI  for  speech 
representation. 

In  Section  3.1,  some  preliminary  definitions  and  notations  are  given,  followed  by 
statements  of  the  Carleson  inequality  theorem  being  extended  and  my  extension  to  it. 
Sections  3.2  through  3.4  contain  the  necessary  lemmas  and  theorems  to  prove  the  main 
result  of  this  chapter,  the  proof  of  which  is  found  in  Section  3.5.  The  second  result  is 
stated  and  proven  in  Section  3.6. 

In  order  to  smooth  the  flow  of  the  presentation,  some  of  the  proofs  of  lemmas  and 
theorems  stated  in  this  chapter  have  been  put  in  Appendix  B. 

3.1  Introduction 

Definition  3.1.1  Let  D  be  the  open  unit  ball  in  C.  For  0  <  p  <  oo,  the  Hardy  space 
Hp( D)  is  defined  to  be  the  space  of  analytic  functions  f  on  D  satisfying 

ll/llp  =  ll/Ik  =  sup  (f  I f(rz)\p da(z)\  <  oo  , 

0<r< 1  \Jd3  J 

where  90  denotes  the  boundary  of  D  and  o  denotes  the  normalized  Lebesgue  measure  on 
90.  The  Hardy  space  H^ D)  is  defined  to  be  the  space  of  analytic  functions  f  on  D 
satisfying 


ll/lloo  =  ll/llffco  =  sup  \f(z)\  <  OO  . 

M<i 


17 


Definition  3.1.2  The  pseudo-hyperbolic  metric,  denoted  is  defined  by 


p(x,z) 


X  —  z 
1  —  xz 


The  pseudo-hyperbolic  ball  about  a  of  radius  r,  denoted  K  (a,r),  is  given  by 


K(a,r )  =  {z  £  D  :  p(a,z)<r}. 


Luecking’s  work  was  done  in  Hp{ ),  and  so  his  definition  of  p  is  an  extension  of 
that  used  here.  His  theorem  is  as  follows. 

Theorem  3.1.3  (Luecking  [20],  [Thm  5.1])  Let  {rn}  be  a  sequence  of  numbers  in  the  open 
interval  (0, 1)  increasing  to  1.  Let  the  doubly  indexed  set  {aU)k  E  C  |  k  =  1, . . k(n ),  n  E 
Z+},  be  such  that  for  each  n,k,  \an^\  —  rn  and  for  each  n,  d(rnBF)  C  [)k=i  K(an,kffi)  for 
some  fixed  6 .  If  p  >  0  and  6  >  0  is  sufficiently  small  (where  6  depends  on  p),  then  for 
some  0  <  C  <  oo, 


ll/llp  <  Csupf^\f(an<k)ni-rn)N 

n  i. _ i 


for  every  f  E  HP(1DN ). 

The  proof  of  his  theorem  is  followed  by  a  statement  to  the  effect  that  as  long  as  the 
separation  condition 


p(an,k,an)k')  >  €  >  0 

for  some  e  >  0  is  satisfied,  then  the  reverse  inequality 

sup £|/(an,*)|p(l -*•„)"  <  C\\f\%,  feHp(B N).  (2) 

n  u 


is  true  [20]. 

The  extension  to  this  theorem  to  follow,  proven  only  for  p  >  1  and  N  =  1,  loosens 
the  constraint  that  the  points  antk  lie  on  increasing  radii  with  respect  to  n.  Instead,  it 
is  allowed  for  each  n,  that  the  points  antk  lie  in  an  appropriately  constrained  set,  Sn.  In 
addition,  the  intended  use  of  this  theorem  requires  the  upper  bound  as  given  in  (2),  so  that 


18 


that  inequality  is  included  in  the  theorem,  together  with  constraints  on  6  to  guarantee  the 
existence  of  such  an  e. 

For  ease  in  stating  the  many  lemmas  and  theorems  to  come,  the  necessary  constraints 
on  the  sets  Sn  to  be  used  in  the  following  theorem  will  be  consolidated  in  the  following 
condition. 

Condition  A  The  sequence  {Sn}  of  compact  subsets  of  D  satisfies  Condition  A  if  the 
following  hold: 

1)  For  each  n  £  Z+,  there  exists  C  Sn,  where  7n  is  a  closed  path  with  winding 
number  u;(7n,  0)  =  1. 

2)  There  exists  a  fixed  0  <  Sr  <  1  such  that  for  rn:  [— 7r,  7t)  — ►  [0, 1)  and  Mn :  [— 7r,  7t) 

[0, 1)  defined  according  to 

Mn{6 )  =  sup{|z|  :  z  £  Sn  and  p(z,$)  <  6r  for  some  s  =  reld  £  Sn} 


and 


rn(6)  =  inf{|z|  :  z  £  Sn  and  p(z,s)  <  6r  for  some  s  =  retd  £  Sn}  , 

with  Mn  ==  max^  Mn(9)  and  r „  ^  mine  rn(9),  we  have 


as  n  — >  oo; 


Mn  —  rn  _ 

lim  sup - — =  Cs  <  1 

n^oo  1  —  Mnrn 


(3) 


and 


Mn  -  rn 
1  -  Mnrn 


0 


(4) 


as  n  ^  oo. 

Some  discussion  of  the  implications  of  Condition  A  is  in  order.  Since  each  set  Sn 
contains  a  closed  path  7n  with  winding  number  w(7n>0)  —  1,  we  know  that  Sn  surrounds 


19 


the  origin.  Since  rn  A 1,  we  know  that  eventually,  the  Sn  are  torus-like  in  that  their  center 
(about  the  origin)  is  hollow.  Equation  (4)  implies  that  the  sets  Sn  become  thin  along  any 
radial  from  the  origin  and  that  the  path  7n  does  not  vary  “too  much.” 

The  main  result  of  this  chapter  is  the  following  theorem,  the  proof  of  which  is  to  be 
found  in  Section  3.5  below. 

Theorem  3.1.4  Fix  1  <  p  <  oo.  Suppose  { Sn }  satisfies  Condition  A  and  let  6r  be  defined 
as  in  Condition  4..  Then  there  exist  0  <  e  <  6  <  min{<5r,  |}  and  0  <  c  <  C  <  oo  such  that 
whenever  {an)Jb}  satisfies  an>k  G  Sn,  p(aU}k,  anj)  >  e  for  all  j  ±  k,  and  Sn  C  \Jk  Ii(aniky6), 
the  inequality 


c||/ll;  <  sup  "■£  1/(0,, ,)|'(1-K.,D  <  Cll/ll'  (5) 

n  k= 1 

is  true  for  all  f  G  Hp(D). 

The  following  section  contains  definitions  and  basic  properties  of  the  Hardy  spaces 
and  the  metric  p,  which  will  be  used  extensively  in  the  remainder  of  the  chapter.  Fol¬ 
lowing  that,  Section  3.3  contains  lemmas  showing  various  inequalities  that  are  necessary 
to  complete  the  proofs  to  follow.  Section  3.4  contains  other  interesting  lemmas  and  the¬ 
orems  required  to  prove  the  main  result,  which  is  done  in  Section  3.5.  Following  this, 
Section  3.6  contains  proofs  of  two  theorems  extended  by  Ward  and  Partington  that  make 
Theorem  3.1.4  useful  in  applications. 


3.2  Preliminaries 

This  section  contains  definitions  and  proofs  of  some  well  known  facts  involving  the 
Hardy  spaces  and  the  metric  p. 

3.2.1  Hardy  Spaces.  The  following  two  theorems  concerning  Hardy  spaces  are 
given  without  proof,  and  can  be  found  in,  e.g.,  [12]. 

Theorem  3.2.1  (Hardy-Littlewood)  Let  0  <  p  <  oo  and  f  G  HP(B>).  Define  F  by 

F(eiB)  =  sup  \f(reie)\  . 

0<r<l 

Then  F  6  Lp(d D)  and  <  -BpH/H^  where  Bp  <  oo  depends  only  on  p. 


20 


Theorem  3.2.2  Let  0  <  p  <  oo  and  let  f  £  HP(D).  Then  f(eie )  =  limr_i  f(re10)  exists 
for  almost  every  6  £  [— 7r,7r)  and  f  £  Lp{d D).  For  1  <  p  <  oo,  we  have 


Ilk 


and  for  p  —  oo,  we  have 


ll/IU.  =  ll/IUco  =  ess  sup  |/(ei9)|  . 

-7T<0<7T 

3.2.2  The  pseudo-hyperbolic  metric ,  p.  The  metric  p  and  the  mapping  (j)a  (defined 
below)  are,  as  will  be  seen,  extremely  useful  in  dealing  with  the  Hardy  spaces.  Their 
important  properties  to  be  used  here  are  given  below. 


Lemma  3.2.3  For  every  a  £  D,  <j>a :D  — ^  D,  defined  by 


<Pa(z)  = 


a  —  z 

1  —  az 


is  well-defined  and  bijective,  with  inverse  (j)al  =  <f>a. 

Proof.  Let  a  E  D.  We  wish  to  show  that  for  any  z  E  D,  that  </>a(z)  E  D  also.  To 
do  this,  it  is  necessary  and  sufficient  to  show  that 


\Mz)\  = 


a  —  z 


1  —  az 


<  1 


Calculating,  we  see 


a  —  z  2  (a  —  z)(a  —  z) 

1  —  az  (1  —  az)(  1  -  az) 

\a\2  +  \z\2  —  2R  e(aJ) 
1  +  |a|2|^|2  —  2Re(aJ) 


Since  both  the  numerator  and  denominator  are  non-negative,  to  show  that  \4>a(z)\  <  1,  it 
is  sufficient  to  show  that  \a\2  +  \z\2  <  1  +  |a|2|z|2,  which  is  shown  by 

1  +  MV  -  M2  -  \z\2  =  (i-H2)(i-H2)  >  o, 


since  a,z  £  D  implies  |a|,  \z\  <  1.  Therefore,  <j>a  is  well-defined. 


21 


To  see  that  <f>a  is  bijective,  first  note  that  it  is  its  own  inverse,  that  is,  for  any  zeD, 


This  shows  that  4>a  is  both  injective  and  surjective.  That  it  is  injective  is  seen  by  the  fact 
that  it  is  invertible  on  its  range  (with  inverse  (j)~l  =  <j>a).  That  it  is  surjective  is  seen  by 
the  fact  that  for  any  z  6  D,  <f>a(y)  =  .z  where  y  =  <j>a(z).  Therefore,  for  any  a  E  D,  (j)a  is 
bijective  on  D.  □ 

Next,  will  be  shown  that  p  defines  a  metric  on  the  set  D.  To  do  this,  we  will  first 
show  one  of  the  nice  properties  of  p. 

Lemma  3.2.4  For  any  a,x,z  E  D, 

p(<t>a(x),<f>a(z))  =  p(x,z). 


—  4>a 


1  —  az 


a  — 


1  —  az 


1  -a-. 


*(i  -  M2) 


=  z 


Proof.  By  straight  forward  calculations, 


/>(<&.(*),  &.(*))  = 


4>a(x)  ~  <K{z} 
1  -  (f>a{x)<i>a{z) 

a  —  x  __  a  —  z 
1  —  ax  1  —  az 

^  g-J 

1  —  ax  1  —  az 


|(1-  |a|2)(l-xz)| 


□ 


The  following  proof  that  p  is  a  metric  is  only  difficult  in  that  the  typical  techniques 
for  showing  the  triangle  inequality  do  not  work. 

Proposition  3.2.5  The  function  p  defines  a  metric  on  D. 


22 


Proof.  The  properties  of  symmetry  and  nonnegativity  can  be  seen  by  inspection. 
The  positivity  property  that  p{x,  y)  >  0  when  x  ^  y  is  trivial  also.  This  leaves  the  triangle 
inequality,  that  is,  for  all  x,y,z  £  D, 

P(M)  <  P(x,y)+p(y,z). 

Using  Lemma  3.2.4  and  the  equalities  p(a,b )  =  |<^>a(6)|  and  (f>a(a)  —  0,  this  inequality  is 
equivalent  to 


p(x,z)  <  |x|  +  |*|, 


where  x  =  </>g(x)  and  z  =  4>g(z).  Working  with  ( p(x,z ))2,  we  see  that 


(p(x,z))2 


|1  —  xz  |2 
(x  —  z)(x  —  z) 

(1  —  xz)(  1  —  xl) 

\x\2  -fi  \z\2  —  2|x||z|  cos0 
1  +  |x|2|z|2  —  2|ir||z|  cos 0  ’ 


where  0  =  arg (xz).  Multiplying  the  right-hand  side  of  the  equation  by  (|gj+j*[)2  we  have 

|x|2  +  \z\2  —  2|x||z|  cos# 


(p(x,z))2 


(1*1 +  N) 


\x\  +  |z|)2(l  +  |x|2|z|2  -  2|®||z|  cos#) 


Note  that  showing  that  p(x,z)  <  Ja;|  +  \z\  is  (of  course)  equivalent  to  showing  (p(x,z))2  < 
(|x|  +  |z|)2,  so  to  complete  the  proof,  it  is  sufficient  to  show  that 


jx|2  +  |z|2  -  gjgjjzj  cos # 

(I® |  +  k|)2(l  +  |x|2|z|2  -  2|a:||z|  cosd)  ~ 

Since  both  numerator  and  denominator  of  the  fraction  are  positive,  this  can  be  done  by 
showing  \x\2  +  \z\2  —  2| a; 1 1 |  cos#  <  (|a:|  +  |^r|)2(l  +  |a:|2|z|2  —  2|x||z|  cos#).  Working  with 
the  quantity, 


(|x|  +  |z|)2(l  +  |x|2|z|2  —  2|x||z|  cos 6)  —  (|a;|2  +  \z\2  -  2|ar 1 1 2r|  cos#) 
=  MM[(2  +  |x||z|(|x|  +  M)2)  -  2((|x|  +  |z|)2  -  l)cos#]  , 


23 


we  can  see  that  for  the  triangle  inequality  to  be  true,  we  must  have  that  (2  +  |x||2|(|x|  + 
|z|)2)  -  2((|x|  +  \z\ )2  -  1)  cos 8  >  0.  Continuing, 

(2  +  |x||z|(|x|  +  |z|)2)  -  2((|x|  +  |z|)2  -  1)  cos  6 

>  (2+  |x||^|(|x|+  |z|)2)-  2|((|x|  +  |x|)2  -  l)cosfl| 

>  (2+|x|M(|x|+M)2)-2|(|x|  +  |x|)2-l|. 

In  the  case  where  |x|  +  \z\  <  1,  this  is  easily  shown  by 

(2  +  |*||*|(|*|  +  |*|)’)  -  2|(|a:|  +  |*|)’  -  1|  =  (2  +  |*||*|(|*|  +  |*|)’)  +  2((|*|  +  |*|)!  -  1) 

=  (|*||*|  +  2)(|*|  +  |*|)’  >  0. 

The  case  where  \x\  +  \z\  >  1  is  slightly  more  difficult.  Assuming  \x\  +  \z\  >  1,  we  have 

(2  +  |*||*|(|*|  +  |*|)’)  -  2|(|*|  +  |*|)’  -  1|  =  (2  +  |*||*|(|*|  +  |*|)’)  -  2(0*1  +  |*|)’  -  1) 

=  4  +  (|*||2|  —  2)(|*|  +  |*|)’  . 

To  show  that  this  right-hand  side  is  non-negative,  first  define  C  =  |x|  +  \z\  and  minimize 
the  value  of  4+(|x||z|  — 2)(|x|  +  |+:|)2  =  4  +  (|x|(C-  |x|)-2)C2.  Using  the  usual  introductory 
calculus  techniques,  one  can  determine  a  candidate  extreme  point  of  |x|  =  j,  which  on 
further  inspection  is  revealed  to  be  a  maximum.  This  implies  that  the  minimum  is  at  an 
end  point  of  the  allowable  range  of  |x|,  that  is,  at  |x|  =  1  or  |x|  =  C  —  1.  By  the  symmetry 
of  the  function,  the  two  endpoints  yield  equivalent  minima,  and  so  we  find  that 

4  +  (|x||*|  -  2)0*1  +  M)2  >  4  -  3C2  +  C3  . 

Solving  for  the  minimal  value  of  the  right-hand  side  in  the  allowable  range  of  C,  we  find  a 
minima  at  C  =  2,  which  gives 

4  +  (laf||*|  -  2)(|*|  +  \z\f  >4  - 12 +  8  =  0. 


Therefore 


(2  +  |x||2|(|x|  +  |^|)2)  —  2((|x|  +  \z\)2  —  l)cos#  >  0, 


24 


which  implies 


_ |x|2  +  |^|2  -  2|a||z|  cos(9 _ 

(|*|  +  |^|)2(1  +  |*|2M2  -  2\x\\z\  cos $)  " 

giving  the  desired  inequality  of 


(p{x,z)f  <  (|x|  +  |z|)2. 


Substituting  to  regain  our  original  variables,  we  get 


P(<j>y(x),<t>y(z))  < 


\<t>y{x)\  +  \4>y(z)\ 


p(x,z)  <  p(x,  y)  +  p(y,  z) . 


□ 


3.3  Lemmas  concerning  various  inequalities 

This  section  contains  lemmas  showing  various  inequalities  that  will  be  used  in  the 
more  interesting  lemmas  and  theorems  to  follow  in  Sections  3.4  and  3.5.  The  proofs  of 
these  lemmas  are  relegated  to  Appendix  B. 

First,  throughout  this  chapter,  it  will  be  necessary  to  make  estimates  based  on  the 
Lebesgue  measure  of  the  set  K(a,r)  in  R2,  denoted  m(ii(a,r)),  normalized  by  r  so  that 
m(D)  =  1.  That  is  the  topic  of  the  following  two  lemmas.  The  nature  of  these  sets  often 
makes  it  easier  to  work  with  Euclidean  balls  instead,  either  larger  or  smaller,  as  appropriate. 
That  is  the  reason  for  the  following  lemma.  Lemma  3.3.2  concerns  an  alternate  definition 
of  the  set  K (a,  r). 

Lemma  3.3.1  Let  a  E  D  and  r  E  [0, 1).  Let  B(a,6)  be  the  Euclidean  ball  centered  at  a  of 
radius  6 ,  that  is, 


B(a,6)  =  {z  E  D  :  \z  —  a\  <  <S}  . 


The  largest  6  such  that  B(a,S)  C  Ii(a,r)  is  given  by 

r(l-H2) 

(i  +  rM) 


25 


The  smallest  6  such  that  K(a,r )  C  B(a,6 )  is  given  by 


6  = 


Ki  -  M2) 

(l-r|a|) 


Proof.  See  Appendix  B. 


Lemma  3.3.2  For  a  £  D,  r£(0, 1),  the  set  Ii(a,r )  is  given  by  K(a,r)  =  B(a,f),  where 

~a  =  a  and  f  = 

Proof.  See  Appendix  B. 


The  following  lemma  is  needed  for  various  inequalities  to  come,  and  simply  states 
how  large  \z\  can  be,  given  p(a,z)  <  6 . 


Lemma  3.3.3  Let  6  E  (0, 1)  and  let  w  E  D.  If  p(w ,  z)  <  S  then 


\*\  < 


8  +  \w\ 
1  +  6\w\ 


and  if  w  ^  0,  this  upper  bound  is  achieved  only  at  the  point 


z  = 


( 


6  +  M  \ 

1  +  S\w\) 


Proof.  See  Appendix  B. 

The  following  lemma  is  simply  a  “utility”  lemma  concerning  set  inclusion  for  use  in 
a  later  proof. 

Lemma  3.3.4  Let  z  E  D  and  r  E  (0, 1).  If  a  £  B(z ,  K(a,r/2)  C  K{z,r). 

Proof.  See  Appendix  B. 


The  following  lemma  will  be  of  use  in  various  proofs  where  a  bound  on  the  angle 
subtended  by  the  set  K(a,6)  (as  measured  from  the  origin)  is  needed. 


26 


Lemma  3.3.5  Fix  6  £  (0, 1)  and  C  >  4.  Then  there  exists  r  £  (0, 1)  such  that  whenever 

M  >  r, 

where  A 6  is  the  angle  subtended  by  K(a,6)  with  respect  to  the  origin . 

Proof.  See  Appendix  B. 

The  remaining  lemmas  in  this  section  are  self-explanatory.  They  all  establish  in¬ 
equalities. 

Lemma  3.3.6  Let  r  E  (0, 1)  and  a:z  ED  such  that  p(a, z)  <  r;  then 

....  1 . .  <  J±L  < 

1  —  \a\\z\  1  —  |a|2  1  -  |a|2 

and 

1  -  |a|2  2  1 

r2(l  —  |aj |^|)2  —  (1  —  r)  r)) 

Proof.  See  Appendix  B. 

Lemma  3.3.7  For  all  a  ED  and  r,  e  £  (0, 1), 

i  <  1 

m(K(a,er))  ~  \e2(l  —  r)2/  m(K(a, r)) 

Proof.  See  Appendix  B. 

Lemma  3.3.8  Let  z,w  £  D  such  that  p(z,w)  <  then  \z  —  <  1  —  \z\. 

Proof.  See  Appendix  B. 


Lemma  3.3.9  Let  r,  6  £  (0, 1)  and  suppose  for  some  w,z  E  that  p(w,  z)  <  6.  Then 


1  16(1  —  r2S2)2  1 

m(K(z,r ))  —  (1  —  <$2)2(1  —  r2)2  m(K(w,r )) 


27 


and 


l-\z 


1  < 


1  —  6J  |1  -  wz\ 


Proof.  See  Appendix  B. 


3.4  Supporting  Lemmas  and  Theorems 

The  more  significant  lemmas  and  theorems  needed  to  prove  the  main  result  are  found 
in  this  section. 

The  next  two  lemmas  deal  with  estimates  of  the  value  of  /  at  a  point  a.  They  will 
be  required  later  in  the  proof  of  the  main  result. 


Lemma  3.4.1  Fix  p  >  1.  Then  there  exists  C  <  00  depending  only  on  r  such  that 


l/(«)lp  <  C 


dm(() 
m(K  (a,  r)) 


for  all  f  analytic  in  D,  a  E  O,  and  r  E  (0, 1). 

Proof.  Let  /  be  analytic  in  D.  From  Cauchy’s  Theorem,  we  have  that  for  all 

r'  G  (0, 1), 


m  = 


Integrating  both  sides  with  respect  to  rf  dr '  we  find 


which  give 


[  /(O)r'dr'  =  —  [  [  f{r'et9)  d6  rf  dr1 

J 0  2tt  J 0  J — 7r 

r2m  =  /  /(C)  dm(Q 

J  r® 


/(0)  =  4/  /(C)  dm(Q  . 

rz  Jr 3 


28 


This  leads  to 


l/(0)l  =  41/  /(0*»(0 

r*  |7r» 

<4/  1/(01  dm{Q. 

T  J  7*311) 

In  the  case  where  p  >  1,  define  q  by  ^  =  1  —  K  Invoking  the  Holder  inequalities,  we  see 
that 

l/(0)l  <4/  |/(()| <lm(0 

r *  Jr3 

s  G£.|/<or im(0T  ■ 

This  leads  to 

l/(0)|»  <  jjmr  dm(Q  . 

Since  i  +  i  =  1,  this  gives 

l/(0)|'  <4  /  \f(0\'dm(0. 

•  J  rID 


Now,  to  extend  this  to  |/(a)|,  let  us  define  a  new  function,  g:B>  — >  C  according  to 
g(z)  =  f(<f>a(z)).  By  hypothesis,  /  is  analytic.  Also,  </>a:D  — »  D  is  analytic  for  all  a  E  D. 
Therefore,  g  is  analytic  also,  and  we  have 


|s(0)|p  <  4  /  \9(0\pd™(0 

JrB 

i/(«)ip  =  \mm\p  <4/  i/(^(0)ipmc). 

Jr® 


Making  the  substitution,  2  =  ^>a(£),  and  using  the  fact  that  </>„ 1  =  </>a,  we  get 


l/(«)lP  < 


e£m(z) 


29 


The  derivative  of  <pa,  denoted  <f>'a:D  — *■  C,  is  given  by 


£(*) 


(i-M2) 

(1  -az)2  ’ 


and  so  <f>'a(<f>a(z ))  is  given  by 


=  - 


(1  —  aa)2 

(i-H2) 


Returning  to  our  inequality,  we  use  this  to  find 


l/MI’  <  4  /  |/( 

^  JKta.r) 


K(a,r) 

=  4  /  l/MI’h~!°H-)rfmM 

T“  JK{a,r)  \1  —  O.Z\~ 


(i  -  i«i2) 


(1  -  az )2 

(i-H2) 


dm(z ) 


Because  p(a,z )  <  r,  by  Lemma  3.3.6  we  have 


l/MI’  <  cf  l/WI’^L. 

4/c(a.r)  m(A  (a,  r)) 


where  C  =  ^  depends  on  r  but  not  a  or  /. 


□ 


The  following  lemma  can  be  found  in,  e.g.,  [2], 

Lemma  3.4.2  Let  f  be  analytic  in  D  and  suppose  that  for  some  a  6  D  and  R  G  (0, 1  —  |a|), 
that \f(z)\  <  M  for  all  z  6  B(a,  R).  Then,  for  each  n  =  1,2,...  and  for  every  z  £  B(a,  ) , 

l/wMI  <  M  (J)”. 


The  following  lemma,  to  be  used  in  the  proof  of  the  main  theorem,  is  an  elaboration 
of  a  step  in  the  proof  of  a  lemma  in  [20],  Note,  that  in  the  usage  of  the  result  in  [20],  the 
value  of  e  was  not  restricted.  This  implies  that  there  may  be  a  better  proof  of  the  lemma 
which  will  not  incur  a  restriction  on  the  value  of  e. 


30 


Lemma  3.4.3  Fix  p  >  0  and  r  £  (0,1).  Then  there  exists  C  <  oo  depending  only  on  p 
and  r  such  that 


I /(*)  -  /Mf  < 


Cep 


dm(Q 
m(K(z ,  r)) 


whenever  p(z ,  w)  <  e  <  |  for  all  f  analytic  in  O. 

Proof.  Given  r,  from  Lemma  3.3.1,  we  know  that 


B(z,R )  C  K(z,r )  C  D. 


where  R  =  r(l  —  |z|2)/2  <  Using  Lemma  3.3.4  above,  we  know  that  for  all 

a  £  B(z,R/2),  the  set  inclusion  K(a,r/2 )  C  Ii(z,r)  holds,  and  so  from  Lemma  3.4.1,  we 
have  that  for  all  a  £  D, 

i/wr  <  ct  . 

JK(a,rf 2)  Tflyli  (ft,  V / 2jJ 


where  C  depends  on  r/2  but  not  a  or  /.  Using  Lemmas  3.3.7  and  3.3.9,  we  then  have 

16(1  -  (r/2)V)2  \  f  .  dm(Q 

■  -f2)2(l-  (r/2)2)2)  JK(a, r/2)  7  m(K(z,r/2 )) 

16(4 -rV)»  \  (  (1  —  r/2)2  \  f  dm(Q 

,(1  -  e2)2(4  -  r2)2/  V(l/2)2(1  -  r)2/ 7j<:(a>r/2)  7  7n(A'(^,r)) 


l/(«)lp  <  c( 


<  C 


<  c 


(  16(4  -rV)2  \  f  .  (  ).p  dmjg 

V(1  —  c2)2(2  +  r)2  )  JK(z,r)  ^  m(K(z,r))  ’ 


.(1  -  e2)2(2  +  r)2, 

where  this  last  step  incorporates  the  factor  (1  -  r)-2  into  C.  Using  that  e  <  |,  we  calculate 

i/wr  <  »ic/  i/toi'.yU , 

jK{zyr)  77l{K  (Z,  r)j 

to  get  a  bound  independent  of  e.  Incorporating  the  constant  81  into  C,  this  gives 


sup  |/(o)|  <  M 

a£B(z,R/  2) 


31 


where 


M 


l/(C)lp 


m(K(z,r))J 


i  Ip 


Since  /  is  analytic  in  D,  it  has  a  Taylor  series  expansion  about  z  valid  in  the  region 
\z  —  w\<l  —  \z\.  From  Lemma  3.3.8,  we  know  that  for  our  choice  of  e,  p(z,w)  <  e  implies 
\z  —  w\  <  1  —  \z\,  and  so  we  may  write 

f(w)  =  f(z)  +  f2fk\z)(W— • 

This  implies  that  (using  Lemma  3.4.2) 

l/(«0-  f(*)\  = 

< 

< 

< 

Using  Lemma  3.3.9,  we  have 


<  Me(e1'  -  1)  , 


k  =  1 


kl 

(w  —  z)k 


k\ 


8  V  (  \w  -  z\  \"  1 


*=i 


(1  -  \z\2)J  k\ 


32 


where  the  fact  that  e  <  |  is  used  to  obtain  a  bound  independent  of  e.  Therefore,  after 
incorporating  constants, 


i/w  -  /wr  <  c*  I  i/(of 


where  C  depends  only  on  r  and  p. 


The  following  two  lemmas  simply  show  the  existence  and  finite  cardinality  of  the  sets 
{an  jt}  needed  in  the  main  result. 

Lemma  3.4.4  Let  S  C  D  be  a  compact  set  and  let  e  £  (0, 1)  be  given.  If  the  indexed  set 
{a*}  C  S  has  the  property  that  p(ak,aj)  >  2e  whenever  j  ^  k,  then  #{ak}  <  oo. 

Proof.  Since  S  is  a  compact  subset  of  D,  there  exists  M  <  1  such  that  \z\  <  M  for 
all  z  £  S.  That  is,  S  C  MD. 

The  condition  p(ak,aj)  >  2e  excludes  the  possibility  of  any  aj  being  in  K(ak,2e) 
( j  ^  k)  so  the  sets  K(aj,e),  j  =  1,2,...  are  pairwise  disjoint.  For  ak  £  S  C  MD,  the 
Lebesgue  measure,  m(K(ak,e)),  of  this  region  is  given  by  Lemma  3.3.2  according  to 

^  e2(l-|a,|2)2 

m(A(t“'£))  =  (i  -  w?  ■ 

Using  the  fact  that  1  —  \ak\2  >  1  -  M 2  and  that  ^z'€2\a^  —  1?  giyes  a  lower  bound  of 

m(K(a,e))  >  e2(l  -  M2)2  >  0. 

Furthermore,  for  M  sufficiently  close  to  1,  we  have 

1  £2(1  -  M2)2 

m(MDfl%,e))  >  -m(K(ak,e))  >  - - -  >0  . 

Since  m(MD)  =  M 2  <  1,  we  have  that  at  most  a  finite  number  of  ak  satisfying  the 
separation  condition  may  be  in  MD  and  hence  in  5.  Therefore,  #{ak}  <  oo.  □ 

Note  that  in  the  following  lemma,  it  may  be  possible  to  replace  the  condition  e  <  6 
with  a  weaker  one.  However,  even  if  it  is  possible,  it  would  require  a  much  more  difficult 
proof,  since  the  shape  of  the  set,  5,  becomes  important. 


33 


Lemma  3.4.5  Let  S  C  D  be  a  compact  set  and  let  0  <  e  <  6  <  1  be  given.  Then  there 
exists  an  indexed  set,  {a*}  C  S,  such  that  p(ak,aj)  >  e  for  all  k  ^  j  and  such  that 
S  C  Ujt  K{ak,fi)-  This  set  will  also  have  the  property  that  #{ak}  <  oo. 

Proof.  (By  construction)  Choose  a1  £  S.  Iteratively  choose  a*  £  S\Uf:ia  /v(a,,<5). 
By  this  choice,  p{ak,ai)  >  6  >  e  for  each  i  =  1, 2, . . .,  k  —  1.  By  Lemma  3.4.4,  we  know 
that  any  set  {«„}  with  the  property  that  p(ak,aj)  >  e  for  k  ^  j  is  finite,  so  that  for  some 
finite  N,  S  \  (J^Li  K{ak ,  6)  =  0,  which  implies  that  5  C  (JfcLi  A'(a*,  ^).  □ 


The  next  lemma  simply  removes  some  calculations  from  the  proofs  of  the  following 
theorem  and  the  main  result. 


Lemma  3.4.6  Let  {S',,}  satisfy  Condition  A.  Then  — -1. 

Proof.  For  each  0  £  [ — 7r,  7t),  we  have 


Mn{0)-rn{6)  ^  Mn{9)  —  rn{9) 
1  -Mn{B)rn{9)  -  1  -rl(9) 


(l-rn(g))-(l-M„(g)) 
(l-rB(tf))(l  +  rn(tf)) 

1  /  _  1  -  Mn(d)\ 

1  +  rn(6 )  V  1  -  rn{6 )  ) 


>  0  . 


Using  (4)  from  Condition  A,  this  implies  -7—  ^1  —  737“- ^  Ao.  Since  77; — Al,  we  have 
that  (l  —  737“-)  Ao,  which  implies  Al.  Therefore,  lZfjn  A 1  ■  □ 


Lemma  3.4.7  Let  {S„}  satisfy  Condition  A.  Then 


lim  sup - z=L  < 

n — -oo  1  —  :V/ri 


2 

1  -C,  ’ 


where  Cs  <  1  is  defined  as  in  Condition  A. 
Proof.  Since  0  <  <  Mn  <  1,  we  get 


2(1 -M„)  >  (l-Mnj^l  +  ryJ  =  i  Mn-?± 

1  -  la  “  1  -  Mnrn  1  -  Mnrn  ’ 


and  hence 


lim  inf 

n— ►  00 


1  "Mn 


1  —  r 


n_ 


> 


-  lim  inf 

2  n— *-oo 


Mn  ~£n 
1  -  MnVn 


(6) 


34 


Therefore, 


1 

2 


1  -  lim  sup 

n— ^co 


Mn  Tn_ 
1  Mnrn 


1  -C, 
2 


>  0  . 


lim  sup 

n—oo 


1 


1  -M„ 


< 


2 

1  -Ct  * 


□ 


The  following  lemma  shows  that,  instead  of  using  the  sequences  of  circles  defined  by 
reie  to  determine  the  norm  of  a  function  /  G  HP(D),  as  in 

ll/ll?  =  limi  /’  l/(re«)|'<», 

r— 1  7f  J„7r 

that  suitably  chosen  sequences  of  functions,  r„(6)  (not  necessarily  continuous  curves),  can 
be  used  instead,  as  in 

\\f\\pP  =  nlim  ±f'\f(rn{9)eu)\'d6. 

Lemma  3.4.8  Let  0  <  p  <  oo  be  fixed  and  for  each  n  G  Z+,  let  rn:[— x,t)  — *■  [0,1)  be 
Lebesgue  measurable  functions  such  that  rn(0)  — *■  1  a.e.  on  [— 7r,7r)  as  n  -*  oo.  Then 

lim  f  |/(rB(0)e")|'^  -  lim  f  |/(rei9)|^  =  ||/||£ 

n^ooj_7r  27T  1  y_7r  27T 


/or  all  f  G  J3p(D). 

Proof.  Let  /  G  ffp(D)  and  define  /  by  /(e‘s)  =  limr_i  f{re'6).  From  Theo¬ 
rem  3.2.2,  we  know  that  this  limit  exists  a.e.  on  6  G  [— 7r,7r),  that  is,  /  G  Lp(d D),  and  also 
that  ||/||p  =  —  \f(ei0)\p  dO.  From  Theorem  3.2.1,  we  have  that  F  G  Lp(d D),  where  F 

is  defined  by 


F(e,e)  =  sup  |/(re*e)|  . 

0<r<l 

Since  rn(0 )  —*■  1  for  a.e.  6  G  [— jr,7r),  we  have  that  f(rn(O)et0)  — *  f(etB )  for  a.e. 
0  G  [— 7T,7r).  Also,  \f(r„(6)eie)\  is  bounded  above  by  F{e10).  Therefore,  by  the  Lebesgue 


35 


Dominated  Convergence  Theorem,  we  have  that 


□ 


Although  it  is  not  obvious  from  its  statement,  what  the  following  lemma  shows  is 
that  given  the  constraint  of  p(Mn,rn )  =  -^0,  the  quantity  [Mn  -  r„](0)  approaches 

0  faster  than  the  radius  of  the  ball  B  ^ Mn{6 ),  (#))  C  K(Mn(0),r).  Later,  this 

will  allow  us  to  get  an  upper  bound  on  the  number  of  points  ak  that  can  lie  within  an 
appropriately  defined  set  Sn,  constrained  by  p(ak,a,j)  >  e,  that  varies  with  e  instead  of 
c2.  The  corollary  that  follows  is  provided  to  make  use  of  this  lemma  more  obvious  in  the 
context  in  which  it  is  used. 


Lemma  3.4.9  Let  {Sn}  satisfy  Condition  A  with  rn  and  Mn  as  defined  in  that  condition. 
Fix  r  G  (0, 1).  Then 


0  . 


Proof.  For  each  0  £  [— tt,  tt), 

Mn(9)-rn(e )  (Mn(0)  -  rn(6))(l  +  rMn(6)) 

\(l+rMn(0))/  V  nV  " 

(Mn{0)  ~  r„(fl))  (1  -  Mn(0)rn(9))  (1  +  rMn{9)) 

(1  -Mn(0)rn(6))  (1  -Mn(9))  r(l  +  M„(9)) 

<  (Mn(9)  -  r„(g))  (1  -rim  (1  +  rMn{9)) 

~  (1  -  Mn(9)rn(9))  (1  -  Mn(0))  r(l  +  Mn(6)) 

(MJfi)  -  r„(g))  (1  -  r„(fl))  (1  +  r„(g))(  1  +  rM„(fl)) 

(1  -  Mn{9)rn{9))  (1  -  Mn{9))  r(l  +  Mn{0)) 

We  have,  from  Lemma  3.4.6,  that  Al,  and  by  inspection,  A  ?  since 

vn  <  Mn  <  1  and  rn  1,  so  that  Mn  — >  1  as  n  — >■  oo.  By  (4)  of  Condition  A,  we  have 
that  4o.  Therefore 

(1  -Mnrn) 


36 


Mn  rn  u 


\(l+rMn)J 


>0  . 


□ 

Lemma  3.4.10  will  require  two  sets,  Z Sn(0 ,  A 0)  and  A 6n,  and  one  function,  A <f>n(s), 
which  are  rather  geometric  in  nature  and  somewhat  confusing.  In  order  to  improve  the 
flow  of  the  proof  of  Lemma  3.4.10,  these  sets  will  he  defined  and  discussed  here,  before  the 
statement  of  the  lemma. 

Let  {5„}  satisfy  Condition  A.  Fix  0  <  r  <  1  and  0  <  t  <  1.  Define  r'n:  [ — 7r,  7t)  — ► 
[0, 1)  by  r'n(0)  =  The  set  ZSn(0,  A0)  is  defined  by 

ZSn(0,A0)  =  {ze5„  :  \0-a,vgz\<tA0}  .  (7) 

The  set  A 9n  is  defined  by 

A 8n  =  sup{A0  :  Z5n(args,  A9)  C  i?(s,  r^(args))  for  all  5  £  Sn}  .  (8) 

Note  that  A 6n  may  not  be  defined  for  some  n.  The  function  A <j>n  :  D  — >  [0,7r]  is  defined 
by 


A 4>n(s)  —  one-half  the  angle  subtended  by  B(s,  r^(arg  5))  relative  to  the  origin  .  (9) 

To  explain  the  significance  of  these  sets,  the  set  ZS„(0,  A 9)  is  the  intersection  between  the 
set  Sn  and  a  wedge  of  the  unit  disk,  D,  of  angular  width  2tA9  centered  at  angle  6.  The 
value  A 6n  represents  the  largest  angular  half-width  of  the  set  ZSn(args,  Ad)  which  is  a 
subset  of  the  set  K ($,r^(arg,s))  for  s  £  Sn,  regardless  of  which  5  £  Sn  is  chosen. 

Lemma  3.4.10  Fix  0  <  r  <  1  and  0  <  t  <  1.  Let  {5n}  satisfy  Condition  A .  Define 
rn:[~  7r?7r)  '  [0,1)  by  r'n(8 )  =  .  Define  the  sets  ZSn(0,  A9)  and  A6n  and  the 

function  A<j>n(s)  by  (1))  (8),  and  (9),  respectively.  Then  there  exists  N  <  00  such  that 
A 0n  is  well-defined  and  A9n  >  tA(j)n(s)  for  every  n  >  N  and  s  £  Sn. 

Proof.  From  Lemma  3.4.9  above,  we  know  that  Fn-Sn  Ao.  Therefore,  there  exists 

rn 

some  N  such  that  for  every  n  >  N,  A0n  is  well-defined.  That  is,  for  every  n  >  N  and 


37 


a  G  S„,  Z5„(0,  A9)  C  B(s,r'n(aTgs)).  Since  r„A  1,  and  ^  ^  1  as  A0  -  0, 

we  have  that  A 9n  —  2inf,es„  Ad>„(s).  Therefore,  there  exists  some  iV  such  that  for  all 
n  >  N,  A >  tAcj>n(s )  for  any  s  G  51,,.  □ 


This  lemma  is  later  used  to  establish  a  bound  on  the  angle  subtended  on  a  given 
radius,  rn,  by  a  ball,  Ii(z,s),  about  a  point  2  on  that  radius. 

Lemma  3.4.11  Let  0  <  s  <  rn  <  1  and  suppose  p(rn,rne>e)  <  s.  Then, 

•  (1  -<) 


sin#  < 


(1  -  s2)  rn 


Proof.  By  the  definition  of  p  (See  Definition  3.1.2),  we  see  that 


^  >  p(rn,rnel  )  = 


rn  ~  rne 


io  i 


|1  -  rle 


2ci8\  > 


which  leads  to 


52  >  2r2(l  —  cos  9)  ' 

~  1  +  r\  —  2  r2  cos  9 

Solving  for  cos  9,  we  find 


cos  6  > 


2 rj  ~  -s2(l  +  r„) 
2^(1  ~s2) 


which  leads  to 


sin2  9  < 


< 


< 


_  (2r2n-  s2{l  +  r*)\2 
\  2r2(i-52)  ; 

s2[4r2(l-r2)2-s2(l-^)2] 
4r^(l  -  s2)2 

s2[4r2(l-ry-s2(l-r:)2] 
4r£(l  —  s2)2 

g2(4r2  -g2)(j  _  r4)2 

4r^(l  —  s2)2 
s\ 4r2)(l-r^)2 
4r^(l  —  s2)2 

r2(l  -  s2)2  ' 


38 


Therefore 


sin  9  < 


j  (1  -  **n) 

(1  -  S2)  rn 


□ 


The  following  theorem  is  an  elaboration  of  a  step  in  the  proof  of  Theorem  3.1.3. 
In  that  proof,  the  step  was  not  justified,  and  the  e  dependency  was  not  revealed.  I  can 
conceive  of  no  way  to  remove  the  e  dependency  completely,  and  this  brings  into  doubt  the 
validity  of  Theorem  3.1.3  for  values  0  <  p  <  1.  I  suspect  is  that  it  is  correct  for  these 
values.  However,  the  proof  might  have  to  be  different. 

Theorem  3.4.12  Fix  0  <  e  <  1  and  0  <  s  <  1.  Let  {5n}  satisfy  Condition  A .  Then  for 
every  set  {an^ *}  such  that  an>k  E  Sn  and  p(ank,anj)  >  e  for  all  j  ^  k,  there  exist  positive 
constants  M  and  C  depending  only  on  s  and  CS}  where  Cs  is  as  defined  in  Condition  A , 
such  that 


W  |/(C)|”  *n(C)  <7  /  \m\pdm(()  (10) 

k  JK(an,kys )  €  JAn 

is  true  for  all  n  >  M  and  all  f  E  HP(D),  where  An  is  any  annulus  containing  each  set  in 
the  collection  {K (an>k,  s)}*=i. 

Proof.  First,  an  upper  bound  will  be  obtained  on  the  number  of  sets  K(an>k,s)  in 
which  any  point  in  annulus  An  can  be  contained.  This  upper  bound  will  be  used  to  limit 
the  magnitude  of  the  sum  relative  to  the  integral  over  An  in  (10). 

The  angle  subtended  by  the  set  B  Mn(9 ),  ^~~v/n(^j))  relafiye  to  the  point  at  the 
origin  is  asymptotically  equal  to  ' (^)) •  Given  our  constraint  that  rn^+ 1  and 

i - m rf  and  >  e(l  —  M^(0)),  we  can  conclude  from  Lemma  3.4.10  that 

there  exists  some  M  <  oo  such  that  for  all  n  >  M  and  each  aUik  E  An,  the  set  K(an) e)  D 
B(z,  covers  a  wedge  of  An  of  angular  width  greater  than  e(l  —  M^{9))  >  e(l  — 

Mn 2)  >  0  from  which  all  other  an> k  are  excluded. 

Next,  we  need  to  determine  the  maximum  number  of  an>fc  that  can  be  in  the  ball 
K(z,s).  This  will  be  done  by  using  the  above  exclusion  angle  and  determining  the  maxi¬ 
mum  angle  that  can  be  covered  on  the  “worst  case”  of  \z\  =  7v  Assume  s  =  p(rn,  rnetg). 
From  Lemma  3.4.11,  we  then  have,  sin0  <  s  Since  — = 0  and  sin  8  ^  9 

7  7  —  (1  -S'1)  rjn_  r_a_ 


39 


as  6 
0  < 
20  < 


0,  we  can  assume  that  for  M  sufficiently  large  and  all  n  >  M,  the  inequality 
-  will  be  true.  That  is,  the  angle  subtended  by  Ii(z,s)  will  be  less  than 


(l-s2)  Ij;. 

4s  (i-£s.4) 


—  (l-S2) 

We  have  bounded  the  angle  subtended  by  the  set  K(z,s)  D  An  above  and  and  have 
bounded  below  the  angle  of  Sn  completely  covered  by  the  set  K(anik,e).  Therefore,  we 
may  obtain  an  upper  bound  on  the  number  of  points  an  Jt  in  the  set  K(z,s )  by  the  ratio 
of  these  two  angles,  given  by 


/  45  (1—  £^_4)\ 

V(l-‘2)  r.  ) 

_  If 

4s  W 

e(l  -Mn2) 

■  A 

i-W  \ 

<  -(^ 
e  \1  -  s2 


1  ~  rn 


JV»(1  -  Mn  ), 
'  1~2^_  \ 
.InC1  ~  Mn)J 


Since  r„  — *  1  and  by  Lemma  3.4.7,  limsupn_OQ  we  know  that  for  M  suffi¬ 

ciently  large  and  all  n  >  M,  the  inequality  ^J=^-  <  — 2C  +  1  will  be  true.  Therefore,  for 
M  sufficiently  large,  the  number  of  points  an>t  in  the  set  K(z,s),  denoted  N,  is  bounded 
above  by 


N  < 


1  (  8s 


€  Vl  —  Si 


i  -  a 


+  i  • 


This  gives  that  for  all  n  sufficiently  large, 

H  /  I/(C)IP  dm(Q  <7  /  I/(C)IP<MC), 

l  JK(ak,s )  £  3  A  n 


where  C  =  (j^j  +  l)  is  independent  of  e  and  An  is  any  annulus  which  contains 

the  set  {K(an]k,s)}.  □ 


3.5  Main  result 

We  can  now  prove  the  main  result  (Theorem  3.1.4).  The  method  of  proof  is  to  show 
that  there  is  a  6  small  enough  for  the  desired  inequalities  to  be  true.  This  is  done  by 
showing  the  inequalities  depend  on  5  and  some  constant  C  independent  of  <5,  allowing  S  to 
be  chosen  such  that  we  have  the  desired  result. 


40 


Proof  (Theorem  3.1.4).  By  Lemma  3.4.5,  for  any  set  {Sn}  as  defined  in  the 
statement  of  this  theorem,  there  exists  a  double  indexed  set  {an,k}  which  satisfies  the  the 
conditions  that  for  each  n,  p(anj,anik )  >  e  for  all  j  ^  k  and  Sn  C  (J K(anik,6).  By 
Lemma  3.4.4,  we  have  that  #{an,k}kk=i  will  be  finite  for  each  n. 

Assume  that  for  each  n,  the  set  {an< *}  is  ordered  by  the  relation  arganj-  <  arg  a„ik 
for  all  j  <  k.  Form  a  set,  7„,  according  to 


Tn 


fc(n) 


\J{z  :  z  =  [(l-  t)\anJ\  +  t|an,i+1|]el'K1-t)ar^~,i+targan,J+l]?  f  e  [0,  l]}  , 


j= i 


wllGIG  arg  CLn fA:(n)+l  —  n  i  2tT  Hilici  \dn 

Define  a  function,  7„:  [ — 7r,  7r)  ->  D  by  7 n(9)  =  z  for  some  z  E  7n  such  that  arg  z  =  0. 
In  the  (at  most  a  finite  number  of)  cases  where  7„(0)  is  ambiguously  defined,  its  value  may 
be  chosen  arbitrarily.  The  function  7n  is  piecewise  continuous  and  bounded  on  a  finite 
interval,  and  is  hence  Lebesgue  measurable. 

Fix  5  E  (0, 1).  By  Lemma  3.4.3,  there  exists  a  constant  C  <  00  depending  only  on 
C5,  p,  and  s  such  that 


\f(z)-f(0\P 


< 


C6P 


dm 

m(K((,s)) 


whenever  p{z,Q  <  8  <  min{<$r,  |}.  Multiply  each  side  of  this  inequality  by  Xs(z>  C)/(^(l  ~ 
|£D),  where  xs  is  defined  by 


Xs(z,0 


: 


if  p(z,C)  <  6 

otherwise 


Let  z  =  7 n(0)  and  integrate  both  sides  of  the  inequality  with  respect  to  dO  to  get 


<  cspJ^ 

<  csp  r 


xMfhQ  [  |  f,p  dm 

tf(l-KI)  V,)'  '  m(K(C,s)) 
X<(7n(g),C)  j0  f  |  , dm 
«(l-|Cl)  m(K(Cs))- 


For  any  fixed  (  6  D  and  for  n  sufficiently  large,  from  Lemma  3.3.5  we  have  that  there 
is  a  c'  depending  only  on  s  such  that  the  maximum  subtended  angle  A 9  in  which  7„(#) 
could  be  within  K((,s )  is  such  that  A 6  <  min{c'(l  -  |d),c'(l  -  r„)}  <  c'(  1  -  |£|).  Since 


41 


X<5(7n(#)?C)  can  only  be  non-zero  within  this  angle,  and  when  it  is  non-zero,  it  takes  on 
the  value  1,  we  have 


*(1-ICI) 


d9  < 


1 

*(1-ICI) 


d6  < 


d_ 

6  ' 


Incorporating  c!  into  C,  we  have 


£  -  nor  & 


<  C6 r-1 


dm 

m(K(C,a))  ‘ 


(11) 


Next,  define  a  positive  Borel  measure  over  D,  —  R+,  by 


where 


E(i 


k~i 


K,Jfc|)<5<5a„,t(z) 


1  if  2  =  a„tk 

0  otherwise 


Integrate  both  sides  of  (11)  over  D  with  respect  to  dfin(()  to  get 

"  X«(7.W.O, 


If 

J3  j  —  7T 


-|/(7nW)-/(C)|P^^«(0 


*(i-ICI) 

<  C7>W  [  \m\P?Z(9  v/MO 

Jm  Jk(q,s)  m{  A  (C,  s)) 

<  I  i/(ormr,£(0  .. 

j.  — 1  JK(anjk,s )  77l(A  (fln>fc,  S)) 

<  «?-'  £  «(i  -  K,.|)  /  1/(01' 

j  JK(an>k,s)  TTlyli  yCl 


Since 


(l-«2|fln,fc|3)2 


< 


m(K(^s))  ~  s*(l-\an,kr-)t  -  (1-K,*|)2 

incorporate  constants  to  give 


,  where  cf  = 


•)) 


depends  only  on  s,  we  can 


If 

J3  J  —  7T 


'  xsfyJfhf) 

«(i-ICI) 


l/(7n(0))  -  /(C)N^Mn(C) 


< 


fc(n)  c  /■ 

ci’-'Ei,  n  /  1/(01' ‘<">(0 

fc  =  l  (1  _  |  &n,k  |  j  J K(an,k,>) 


42 


By  the  definition  of  Mn,  we  know  that  ^  <  ^=,  giving 


if 


XdjjniOfO 

«(1-ICI) 


l/(7n(^))-/(C)|P^^n(C)  < 


c&  r 

~  Mn  k=1  J K(an<k ,«) 


\f(0\p  dm(Z) 


Since  s  and  e  are  fixed  and  A' (an^,  e/2)  are  disjoint,  by  Theorem  3.4.12  we  know 
there  exists  some  constant  d  depending  only  on  s  and  Cs  such  that 

\m\Pdm(0  <  d-\  \f(0\pdm(0, 

fln.fc,*)  6  JAn(s) 


where  An(s),  defined  by 


An(s)  =  {z  6  D  :  rv  —  2s(l  -  r^)  <  \z\  <  Mn  +  2s(l  -  Mn)}  , 


is  an  annulus  which  contains  each  of  the  K(anik,s )  for  k  =  1,2, . . .,  k(n).  Incorporating 
the  constant  d  into  C,  this  gives 


If 

«/ ID  «/  — 7T 


*  X*(7n(0),C) 


|/(7n(^))-/(C)|P^^n(C) 


csp€~1  r 

-  i-mljm,^() 

CfsVf- 1  /*Mn+25(l~M„)  1  p  7T 

<  __  /  -  /  |/(re,'*)|pd0rdr 

-  1  -  Mn  Jr.-isd-rj  7T  J —tt  Jl 


< 


< 


C6pe~ 1  /•Mn+2s(i-Mn) 

1  —  Afn  Jrn  — 2s(l-rn) 


/•Mn  +  25( 

/  2||/||Prdr 

«/£»_- 25(1- 

Cbpe~x  rJ^+2s 


C^e"1  rM 

1  -MfJrn- 


2  5 


2||/||Prdr 


<  2(1  + 4a)(l  -  fa) 


CbpC 


1-  Mr 

Incorporating  the  constant  2(1  +  4s)  into  C  gives 

r  x«(7»(^),o 


p  ■ 


if 

JM  j  —  7T 


«(1-ICI) 


|/(7nW)-/(0lp^^n(C)  <  C^e-'i-^ll/HP. 


(12) 


43 


Now  it  is  necessary  to  break  apart  the  left-hand  side  integral.  We  raise  each  side  of 
(12)  to  the  l/p  power  and  invoke  Minkowski’s  inequality  to  get 


If 

B  J  —  7T 


'  Xti TnCgU) 

*(i-ICI) 


\f{lnmpd6dpn{0 


1/p 


-[If 


'  XjilniOfO 

<5(1  -ici) 


\f(0\pdedpn(0 


1 1  Ip 


<  cte-W-l_Jk 

VI  -Mn 

Applying  Fubini’s  theorem  to  the  first  integral  of  (13),  we  have 

r  xAlrMiO 


i  Ip 


(13) 


If 

Jjfr  J  —  tt 


<5(1 -ICI) 

r  rxsMou) 

J  —  TT  */  © 


l/(7n(6,))|p  d,6dpn{Q 

\f(lnmpdPn(Qde 


®  ^(i  —  ICI) 

'  k(n)  Xt(ln{0),an,k) 

..*1) 


=  r‘£;V(i-Kt|)x‘(T”w’“”'‘) 

k= i 


<5(1  -  |On,fc|) 

=  [  Xi{ln(9),an,k)\f{ln(0))\P  dd  . 

*  =  l*'-,r 


l/(7n(^))|P^ 


*(")  ,7T 


Since  the  sets  K(anik,6 )  overlap  and  cover  Sn,  which  “surrounds”  the  origin,  we  have  that 


If 


W  Xf(l n(#),C) 


^(n)  /»7T 


^(1  _  |£|)  lMmvv/1  'dedpn(0  =  JfJ^X6{ln(0),an,k)\f(ln(0))\Pdd 


l/(7»(W< 


> 


r  \f(in(e))\pd0. 

J  —  TT 


(14) 


Working  now  with  the  second  integral  of  (13),  we  have 

=  Sjwr  0. 

As  before,  the  subtended  angle  A 9  for  which  Xtf(7n(0)>C)  is  non-zero  is  bounded  above  by 
c'(  1  —  |C|).  Therefore, 


Xf(7n(0),C) 

<5(1 -ICI) 


de  < 


c 1 

1  ‘ 


44 


This  gives 


< 


< 


< 


cj  jh/toNMo 

-i  Hn) 

~F  'y  y  fJ-n{^n,k)\f{0'n,k)\P 
k  —  l 

0  k= 1 
k(n) 

e'Ed-  \an,k\)\f(an,k)\p  ■ 
k=l 


(15) 


Substituting  (14)  and  (15)  back  into  (13)  gives 


which  implies 


and  also 


r  r* 

i/p 

k(n) 

\  \fhn(0))\pde 

-c' 

2K1  - 

U -TV 

_k~l 

i/p 


<  cfie-1'?  ( ±-Jk) 
Vl  -MJ 


1/p 


P  f 


r  /”r 

i  Ip 

fc(n) 

-  c'  sup 

S(i  -  k«,*i)i/(«»,*)r 

L  J  -TV 

n 

k= i 

1  1  fP 


f  \f(inmpde 

J  —  7 r 

<  d  sup 


i  i/p 


-  C6e~1/p 


1  -M„ 


1/p 


ll/ll 


Ar(n) 

EP-  Wn,k\)\f(an,k)\P 
k= 1 


1 1/p 


Since  this  is  true  for  each  n,  it  is  true  in  the  limit.  Using  Lemmas  3.4.8  and  3.4.7  to  take 
the  limit  of  the  left-hand  side  as  n  — >  oo,  we  obtain 


lim  inf 

n— kx> 


[/_* 


de 


i/p 


,  I  __  r  \  !/p 
C8e~1/p  -  ■== 

1  —  Mn 


45 


=  ll/llp -limsup 


1  -r„ 


rfif-'/p  - _ =£- 

Cbe  (i  -Wj  "J"p 


1/p 


]_  — 

P  -  IU  IIP  llmsuP  \  -  =#p 

n— *■  oo  \  1  —  lvln 


—  C8e  1/p||/||D  limsup 

2  n  1/p 


i  Ip 


>  \\f\\p-C8e-l'r  Xi_c 


Since  C  is  independent  of  8  and  p  >  1,  we  may  choose  8  and  c  so  that 


/  9  \  1/p 

ctrU,lT=c)  <*• 


giving 


P  <  sup 


A:(n) 


S(1  -  K,k\)\f(an,k)\f 


=  1 


i  Ip 


where  c  >  0  depends  on  C8J  5,  and  p  but  not  on  /,  {Sn},  or  {an> *}.  Thus  we  have  proven 
the  left-hand  inequality  of  (5). 

The  proof  of  the  right-hand  inequality  of  (5)  is  much  easier.  From  Lemma  3.4.1 
above,  we  have  that  there  exists  a  constant,  C,  depending  only  on  e  such  that 


l/(«n,fc)P  <  Cf  1/(01*,^ 

J K(antk,(e/~))  TTlyK  ( 


«/2)))  ‘ 


Therefore, 


fc(n)  *(n) 

l/(a„,Jfc)|p(1  —  <  Cj2(l-\anik\) 

k= 1  k=l  JK 


1/(01 


dm(C) 


K(an,k)(e/  2))  rn(K(ani  *,  (e/2))) 


Since 


< 


< 


m(Ji  (e/2)))  (e/2)2(l  -  K,jfc|)2  (e/2)2(l  -  Mn)(l  -  |an>*|)  ’ 


we  have 


k(n)  „  k(n) 

E  l/Kt)l'(i  -  k.«l)  <  7  ,9W1  jr,  E  /  l/COI' «*n»(C)  •  (16) 

J“i  (e/2)2(l  -  Af„)  "  JK(an,kM 2)) 


46 


The  K(an:k,(e/2 ))  are  disjoint,  and  so  we  may  bound  the  sum  of  integrals  over  them  by 
an  integral  over  an  annulus  containing  them,  giving 


S  l/K*)|P(l  -  l«n,*|) 


k= 1 


< 


c 

-Wn) 


[  \f(0\pdm(0, 


where 


1  -  (e/2 )t\ 
Applying  this  inequality  to  (16),  we  get 


1  +  (e/2  )Mn 


*(")  c  fMn+^2><1~Mn  >  /-IT  JQ 

Y^\f(an,k)\p(l-\an,k\)  <  (e/2)3(1  _  ^  I-Jf^P^rdr 

C 


< 


< 


(</2)rn 
(e/2)(l-^) 


(e/2)a(l 

C  ...  4(e/2) 


/  jrr  2n/n?»'‘ir 

«/  r„ - : — ;  .  77TTT= — 


:(i  + 


(e/2)2(l-M„)v  1  ~  (e/2) 

Incorporating  the  constants  based  on  e  into  C,  we  get 

*(")  n  _  r  ) 

£|/K*)lp(i-k,*|)  <  c-K 

jfc=l 


xi-yi/ir 


(1  -  Mn) 


By  Lemma  3.4.7,  we  know  that  (6)  holds,  and  so  there  exists  some  constant  c  <  oo  such 

(l—r  ) 

that  —  c  ^or  n*  IncorPora^ng  this  constant,  c,  into  C  gives 


k(n) 


E  i/Koro  -  k*d  <  cm. 


k  —  1 


proving  the  right-hand  inequality  of  (5). 


□ 


3.6  Representation  of  elements  of  the  Hardy  spaces,  Hp (O) 

Discussed  in  this  section  are  two  theorems  which  make  it  possible  to  use  the  results 
of  Theorem  3.1.4  to  find  representations  for  elements  of  the  Hardy  spaces,  Hp( D).  Theo¬ 
rem  3.6.1  is  due  to  Bonsall  and  is  found  in  [5:Theorem  1].  It  was  later  extended  by  Dudley 


47 


Ward  and  Partington  in  [33:Theorem  3].  A  proof  for  Theorem  3.6.1,  an  elaboration  of  that 
given  in  [33],  is  given  in  Appendix  B,  since  it  is  so  long. 

Theorem  3.6.4  is  the  one  of  interest  here.  It  relates  the  the  Carleson  inequalities 
established  in  Theorem  3.1.4  to  norms  of  sequences  of  coefficients  used  to  represent  func¬ 
tions  in  Hp( D).  It  is  a  rather  trivial  extension,  based  on  Theorem  3.1.4,  of  [33:Theorem 
5],  which  is  itself  an  extension  of  [20:Theorem  5.5]. 

Theorem  3.6.1  (Ward  and  Partington,  [33])  Let  X  be  a  Banach  space  and  define 
X1  C  X  by  Xi  —  {/  £  X  :  ||/||  <  1}.  For  n  £  Z+,  let  En  be  a  finite  subset  of  X  with 
#(En)  =  k(n ).  Let  the  set  E  be  defined  by  E  ~  Un  & n  and  the  set  Ei>p  by 

N  k(n )  N  (k{n)  \  1/p 

E\  P  =  {h  E  X  :  h  =  ^  ^  j  ^  |Ani,t|p  J  <1,  wn,£  E  En,  N  <  00}  . 

n~\  k  =  1  n  =  l  y A:  =  1  J 

Let  A(E,/)  be  defined  by 

A (E ,  y*)  —  {A  —  •  f  —  ^  ^  ^n,k^n,k  ?  ^n,A:  £  Fn }  . 

n,k 

Let  m,  M }  p,  and  q  be  positive  constants  with  ~  +  ~  =  1.  Then  the  following  statements 
are  equivalent. 

1.  For  each  $  £  X*, 


m||$||  <  sup  < 


’k(n)  \  l/q 

£!♦(«», »)i* 

k  =  l  J 


^n7k  £  En  / 


<  mii*h  • 


2.  mX1  C  EPii  and  for  each  h  £  EPji,  where  h  =  J2f=i  xj>  an d  xj  —  Ylk=l  ajtk^j,k}  with 
Ef=i  {zi'il  laj,tlp)‘/P  <  1,  «*  have  £f=1  ||*,||  <  M. 

3.  For  each  f  £  X ,  A(E,f)  is  non-empty  and 


^  <  inf{||A|| 


1  ,P 


Ae  A  (£,/)}  < 


Proof.  See  Appendix  B. 


48 


Before  stating  Theorem  3.6.4,  two  lemmas  needed  in  its  proof  will  be  stated  and 
proved.  The  following  lemma  will  be  necessary  to  show  that  the  conditions  of  Theorem  3.6.1 
are  met. 


Lemma  3.6.2  Fix  a  £  O,  let  q  £  (l,oo),  and  define  i[>a  :  D  — *  C  by 


= 


(i  - 1-])1'' 

1  —  az 


(17) 


Then  £  i?p(B),  where  ^  =  1  —  K 

Proof.  Since  tl)a  is  a  rational  function  with  a  single  pole  at  z  =  ^  ^  B,  it  is 
analytic  in  B  and  so  has  a  Maclaurin  series  expansion  which  is  valid  in  all  of  B.  This 
expansion  can  be  calculated  to  be 

OO 

<Mz)  =  (18) 

3=0 


To  see  that  ipa  £  Hp,  it  is  sufficient  to  show  that 


j=  0 

Calculating, 


j=  0 


(i  -  iaD^x:(Mpy 

i=0 


(Hg 

1  -  \a\P 


<  OO  . 


□ 

This  lemma  provides  the  link  between  the  sampled  values  in  the  inequalities  of  the 
main  result,  Theorem  3.1.4,  and  the  functional  values  needed  to  prove  Theorem  3.6.4  based 
on  the  results  of  Theorem  3.6.1. 

Lemma  3.6.3  Let  a  £  B  andp  £  (1,  oo).  Define  q  by  ^  =  1—  K  Let  F  £  Hp( B)*  =  Hq( B). 
Choose  f  £  Ht(P)  such  that 

F{4>)  —  lim  f  <fr(rz)f(rz)da(z) 

r— 1“  Jd9 


49 


for  all  <f>  £  Hp(D).  Let  ipa  £  Hp( D)  be  as  defined  in  (17).  Then, 


Ftya)  =  (1  -  |«|)177(a)  • 


Proof.  Since  /  £  Hq( D),  it  may  be  expressed  as  a  Maclaurin  series,  that  is, 
f(z )  =  Y)f=oljzi  f°r  some  choice  of  {7;}jt=o-  Using  the  Maclaurin  series  for  given  in 
(18),  we  have  that  the  action  of  F  on  ipa  is  given  by 


F(ipa)  =  lim  f  f(rz)'fia(rz)da(z) 
r— I”  Jd  1 

=  Si~Jaw  (^l-  iai)i/?£(®)fc(r2)^  d°(z) 

.  CO  oo 

=  (1  —  |a|)U?  lim  /  V  V  fiJakF+kzizk  da(z) 

r~l-  JdMj=Qk=0 

OO 

=  (1  —  |a|)U?  lim  Y.Tjafr* 

r^~  j=  o 

OO 

=  (i-W^EtT* 

j=0 

=  (1  —  |a|)1/?/(a)  . 


□ 


As  mentioned  earlier,  Theorem  3.6.4  is  an  extension  of  [20:Theorem  5.5].  Previ¬ 
ously,  [20:Theorem  5.5]  was  extended  by  Ward  and  Partington  (see  [33:Theorem  5])  to 
include  specific  upper  and  lower  bounds.  Below  in  Theorem  3.6.4,  [33:Theorem  5]  is  mod¬ 
ified  to  allow  the  set  {an  k}  to  satisfy  the  conditions  of  the  main  result,  Theorem  3.1.4. 
The  proof  of  Theorem  3.6.4  is  unchanged  from  that  given  in  [33:Theorem  5].  While  this 
theorem  is  easily  proven,  this  result  is  the  one  which  will  be  of  direct  use  in  developing 
frames  in  Chapter  V. 

The  representations  proven  in  this  theorem  are  defined  with  the  use  of  sequences  in 
£ifP  for  1  <  p  <  oo,  defined  as 


ti,p 


=  < 


4(n) 


i  Ip 


E  Ew  < 


00 


n  =  l,... 


n  =  1  \k  =  1 


50 


Theorem  3.6.4  Let  {S„},  6,  e,  and  q  be  as  in  Theorem  3.1.4  (where  q  replaces 

p).  Then  every  f  E  Hp( D)  has  the  form 


/(*) 


\  \an,k\)l^q 

/  j  k  /1  -  \ 

tt  iX~an,kz) 


where  ^  =  1  —  X  =  {An^}  £  £iiP  and 


cii/ii,  <  <  can, 

AGA  pU) 

for  some  0  <  c  <  C  <  oo  independent  of  f. 

Proof.  This  proof  follows  that  in  [33]. 

It  is  necessary  to  show  the  existence  of  some  fc,  K  >  0  such  that  for  all  /  £  Hp  (D) , 


*n/ii,  < 


,  i”f  JIAlk, 

A€Ap(/) 


<  K 


To  do  this,  we  will  invoke  Theorem  3.6.1. 

First,  note  that  HP(D)  is  a  Banach  space,  with  dual  space  Hp{ D)*  =  H g(D).  Let 
$  E  Hp(p>)*  with  corresponding  <p  G  Hq(B>),  and  define  ifntk  by 


$n,k(Z) 


(1  -  |a„,fc|)1/g 
1  -  aTJz 


We  know  that  k  G  Hp{ D)  by  Lemma  3.6.2,  and  from  Lemma  3.6.3,  we  know  that 


=  (1  -  |Gn,ifc|)1/S<?Kan,*) 


From  Theorem  3.1.4,  we  know  that  for  our  choice  of  a„}k  and  for  some  choice  of  0  <  m,M  < 
oo,  we  have 


/fc(n)  \ 

m\\(f>\\  <  sup  I  y]  \<f>(anik)\9(l  -  |a„iir|)  j  <  M\\<f>\\ 


51 


which  gives 


m\\(f)\\  <  sup 

n 

Having  established  this,  Theorem  3.6.1,  Part  1,  applies,  and  so  from  Part  3  of  the  same 
theorem  we  have 


fc(n)  \ 

<  M \\<f>\\. 

=  l  J 


M 


inf 

A6Ap(/) 


i  >p 


< 


□ 


3. 7  Summary 

In  this  chapter,  Theorem  3.1.4  was  proven,  which  provided  bounds  on  weighted  sums 
of  sampled  values  of  elements  of  the  Hardy  spaces,  for  1  <  p  <  oo.  This  result  was 

combined  with  the  results  in  [33]  to  achieve  a  representation  for  elements  of  these  Hardy 
spaces,  as  given  in  Theorem  3.6.4.  It  is  this  representation,  applied  to  elements  of  the 
Hardy  space  ir2(D)  (itself  also  a  Hilbert  space),  which  will  be  used  in  the  chapters  to  come 
in  developing  a  frame  suitable  for  speech  representation. 


52 


IV.  Frames  for  L2{R)  and  a  frame-like  operator 

Presented  in  this  chapter  is  a  frame  for  Z2(R)  which  is  based  on  a  frame  or  frames 
for  L2(R+)  and  on  hard,  decaying  exponential  windows.  This  frame  has  a  great  deal  of 
versatility,  since  it  allows  for  frame  design  based  on  data-specific  criteria.  This  is  the  first 
main  result  of  this  chapter.  This  result  (Theorem  4.2.4)  will  be  used  in  the  development 
of  the  frame  for  speech  representation,  which  is  presented  in  Chapter  VI. 

A  second  main  result  of  this  chapter  is  the  development  of  an  operator  akin  to  the 
frame  operator.  This  operator  is  based  on  projections  onto  subspaces  instead  of  frame 
elements.  The  usual  frame  operator  is  a  special  case  of  this  more  general  operator.  There 
are  many  practical  applications  for  this  operator,  since  its  inverse  can  be  calculated  in  a 
manner  similar  to  that  of  the  usual  frame  operator.  One  of  the  applications  is  to  find 
representations  based  on  the  aforementioned  frame  for  Z2( R).  This  result  (Theorem  4.3.1) 
is  used  in  the  calculation  of  frame  representations  in  the  speech  frame  in  the  computer 
program  described  in  Chapter  VI. 

In  Section  4.1,  frames  and  the  frame  operator  are  defined  and  some  useful  properties 
are  presented.  Section  4.2  presents  a  general  class  of  frames  for  Z2( R),  and  Section  4.3 
presents  the  subspace-based,  frame-like,  operator. 

4-1  Frame  and  frame  operator  properties 

Some  of  the  theorems  presented  in  this  section  will  not  be  needed  until  the  next 
chapter.  However,  it  is  convenient  to  group  these  basic  theorems  here.  Frame  properties 
dealing  exclusively  with  the  frame  elements  used  in  the  next  chapter  will  be  presented 
there. 

Definition  4*1.1  Let  TL  be  a  separable  Hilbert  space  and  let  {<j>n}  C  TL  be  an  indexed, 
countable  set.  Suppose  there  exists  constants  0  <  A  <  B  <  00  such  that  for  every  f  E  TL, 

All/ll2  <  D(/-<«IJ  <  Sli/ll2 

n 

where  (•,•}  is  the  inner  product  defined  on  TL  and  ||/||  =  (/,  f)K  Then  the  set  {4>n}  is 
called  a  frame  for  Ti  with  frame  bounds  A  and  B.  If  A  —  B  the  frame  is  called  tight.  If 
the  set  ceases  to  be  a  frame  when  a  single  element  is  excluded,  the  frame  is  called  exact. 


53 


The  operator  F:H  — >  defined  by 


Ff  = 

n 


(19) 


is  called  the  frame  operator. 

The  following  two  theorems  are  well  known  and  are  presented  without  proof.  They 
can  be  found  in,  e.g.,  [4]. 

Theorem  4.1.2  Let  {(j)n}  be  a  frame  in  Hilbert  space  TL  with  frame  bounds  A  and  B. 
Then  the  frame  operator  F:TL  — ►  Tt  is  invertible  and  the  set  {(j>n}  —  {F~1(f>n}  is  also  a 
frame  for  TL,  called  the  dual  frame  to  {<f>n},  with  frame  bounds  and  Also,  every  f  £TL 
may  be  represented  by 


f  =  £</,£»>*»  =  E  UAn)K-  (20) 

n  n 

These  representations  are  called  the  frame  expansion  and  dual  frame  expansion,  respec¬ 
tively,  of  f. 

Theorem  4.1.3  Let  {cf>n}  be  an  exact  frame  in  Hilbert  space  TL  with  dual  frame  {(f)n}. 
Then,  {(f>n}  and  {<fin}  are  biorthonormal.  That  is,  for  all  n,  m, 

{t4>n')4>rn)  —  ^mfn  5 


where  6m)Tl  is  the  Kronecker  delta. 

The  following  theorem  shows  that  a  frame  for  a  subspace  can  be  used  to  define  the 
orthogonal  projection  onto  that  subspace. 

Theorem  4.1.4  Let  TL  be  a  Hilbert  space,  V  C  TL  be  a  subspace,  and  {(j)k}  and  {<j)k}  be 
a  frame  and  a  dual  frame  in  V ,  respectively.  Then,  the  orthogonal  projection  onto  V , 
PV:TL  V ,  is  given  by 

Pvf  =  =  £</.&>& 

k  k 

where  f  £  TL.  That  is,  the  projection  onto  the  subspace  V  is  given  by  the  frame  expansion 
in  that  subspace . 


54 


Proof.  Let  f  6  H.  We  have  that  /  =  Pv  f  +  Pv±f,  where  V 1  is  the  orthogonal 
complement  of  V  in  H.  Since  Pvf£V,  we  may  write 

Pvf  = 

k  k 

Since  Pyx  6  V 1  and  V x  J_  V ,  we  know  that  for  all  k, 


{Pv±f,4>k)  —  {Pv±f,$k)  —  0- 


This  gives 

Pvf  =  E«Pk/,&)  +  <*w,&»&  =  E  ((Pvf,j>k)  +  (Pv-fJk))<t>k, 

k  k 

leading  directly  to 


Pvf  =  £</’**>&  =  E</>&>&  • 

A;  A: 


□ 

The  proof  of  the  following  lemma  is  fairly  straight  forward.  It  is  necessary  to  prove 
Theorem  5.2.7  in  the  next  chapter. 

Lemma  4.1.5  Let  n }  U  {4>n}  be  an  exact  frame  for  a  Hilbert  space  TL  such  that  V  = 
span  {^n}  andW  =  span{^n},  where  V  _L  W.  Then  the  dual  frame  is  given  by  {'0n}u{</>n}; 
where  {^n}  and  {<f>n}  are  the  dual  frames  in  V  and  W  o/{^n}  and  {4>n},  respectively.  If 
Ay  and  By  are  the  frame  bounds  o/{^n}  Hi  V  and  Aw  and  Bw  are  the  frame  bounds  of 
{4>n}  in  W,  then  min {Ay,Aw}  and  ma x{Bv,Bw}  are  the  frame  bounds  o/{^n}  U  {<pn} 
in  H. 

Proof.  Let  {^n}  U  {<f>n}  be  the  dual  frame  to  {^n}  U  {<£n}.  Since  {i>n}  U  {<j)n}  is 
exact  in  77,  we  know  immediately  that  {^n}  and  {4>n}  are  exact  in  V  and  W ,  respectively. 
By  the  properties  of  the  frame  and  dual  frame,  for  every  /  G  77,  we  have 

n  n 

n  n 


55 


Since  C  W,  we  see  that 


i’k  =  ^2(^k,4>n)4>n+ 

n  n 


Since  {^n}  and  {4>n}  span  orthogonal  subspaces,  we  know  that  =  0  for  all  A:  and 

m.  By  the  exactness  of  the  frames,  we  know  that  (^,^n)  —  f>n,k-  Therefore, 

i>k  =  i>k 

for  all  fc.  Likewise,  <£*.  =  4>k  f°r  all  Therefore,  the  dual  frame  to  {^n}u{0n}  is  {V?n}u{</>n} 
where  {V>n}  and  {</>n}  are  the  dual  frames  in  V  and  W  to  {^n}  and  {</>n},  respectively. 

To  show  the  frame  bounds,  note  that  each  /  E  H  can  be  uniquely  written  as  /  = 
fv  +  fw,  where  fv  E  V  and  fw  E  W.  Then, 

£K/,iMi2+£i(/,«i!  =  £i</v+a..«i’+£i<A'+/w,a.>ij 

=  EK/v,«i2+EK/».«P. 


which  gives  that 

AviiMr+A^n/wii2  <  j:\ifvM  i2+£i<wn>i2  ^  ^iimi2+^ii/v,ii2. 

Since 


min{A^,  Aty}||/||2  —  min{A^,  ^W}(||/v||2  +  il/vv||2)  <  ^.v||/v||2  +  ^.vv||/w||2 


and 

max{5^/, i?v^}||/||2  =  m&x{Bv ,  Bw}(\\fv\\2  +  \\fw\\2)  >  ^v||/v||2  +  ^w||/w||2  ? 


this  gives 


mm{Av,Aw}\\f\\ 2  <  £ \(f, ^„)|2  +  £  |(/, ^>n)|2  <  max{5r, 5W}||/||2  . 


□ 


56 


This  next  theorem  reveals  a  useful  property  of  the  dual  frame  elements  used  in  the 
next  chapter.  First,  a  definition  is  required. 

Definition  4.1.6  Let  {an}  C  V  be  an  indexed  set  of  points  in  some  domain  V  and  let 
{fn}  be  an  indexed  set  where  for  each  n,  fn  :  V  — ►  C.  The  set  {fn}  is  called  a  set  of 
interpolation  functions  on  {an}  if  for  each  n  and  each  m  ^  we  have  fn(an )  ^  0  and 

fnifirn)  —  0. 

Theorem  4.1.7  Let  {4>n}  be  an  exact  frame  in  a  Hilbert  space  TL  of  functions  f  :  V  — ►  C 
and  let  {an}  C  V .  Suppose  that  for  some  {cn}  C  C,  cn  /  0  for  all  n ,  we  have  (f,<t>n)  — 
cnf(ctn)  for  all  n  and  all  f  £  Ti.  Then  {(f>n],  the  dual  frame  to  will  be  of  the  form 

{j^In}  cohere  the  set  { Ln }  is  a  set  of  interpolation  functions  on  the  sample  points  {an}. 

Proof.  Because  {< pn }  is  an  exact  frame,  from  Theorem  4.1.3,  we  know  that  {<t>n} 
and  {4>n}  are  biorthonormal.  Also,  we  have  that  (<t>mi4>n)  —  cn</>m(an),  which  implies 
j>m(an)  —  b~frL-  Therefore,  {4>n}  is  of  the  form  of  -In }  where  {/n}  is  a  set  of  interpolating 
functions  on  the  sample  points  {an}.  □ 

The  following  theorem  is  useful  in  that  it  gives  an  expression  for  the  dual  frame  that 
does  not  depend  on  iterative  methods.  While  its  applicability  may  be  limited,  it  is  useful 
in  the  next  chapter,  where  the  projections  involved  are  explicitly  known. 

Theorem  4.1.8  Let  {'ipk}  be  an  exact  frame  in  a  Hilbert  space  TL.  Then ,  the  dual  frame , 
is  given  by 

7  „  A  -  PsAk  _  jp k  -  Psk^k 

k  (fatfa-PsM  ’ 

where  for  each  k,  Sk  is  the  space  spanned  by  the  set  {^j}j^k  and  the  operator  Psk  is  the 
orthogonal  projection  operator  onto  space  Sk. 

Proof.  Since  the  frame  {^}  is  exact,  from  Theorem  4.1.3  we  know  it  satisfies  the 
property  that  {'ifj^'tfk)  —  $jtk  for  all  k.  Since  for  all  j  ^  k  we  have  that  'ifj  £  Sk  and  since 
Sk  L  -  PSk  ^),  we  have  the  result  by  inspection.  □ 

Also  note  that  the  result  of  the  previous  theorem  may  be  written  as 

,7,  _  ti  ~ 

1 


57 


where  {4>j}  is  an  orthonormal  basis  for  the  space  S*,  or  as 


7  i’k  ~ 

Yk  —  - ~ -  7 

(4>k,i>k  - 

where  {<f>j}  is  a  frame  for  S*  with  dual  frame  {<f>k}-  Both  of  these  representations  may  be 
useful  in  obtaining  approximations  to  ipk- 


4-2  A  frame  for  T2(R). 

In  this  section,  a  theorem  showing  a  method  to  construct  a  frame  for  i2( IK)  from 
frames  for  Z2( R+)  using  decaying  exponential  windows  is  proven.  A  decaying  exponential 
window  refers  to  a  function,  /  :  R  — >  <C  of  the  form 


/  0) 


e(a+i0)(x-t)  X  £  J 

0  otherwise 


(21) 


where  a,/?,t£R  and  J  C  R  is  a  bounded  interval.  This  frame  is  useful  in  that  the  shifts 
of  the  windows  (i.e.,  the  locations  of  the  bounded  intervals  on  which  the  windows  are 
non-zero)  can  vary  to  some  extent,  allowing  for  adaptive  frames  to  be  developed  based  on, 
e.g.,  a  function  that  one  wishes  to  represent  well. 

This  first  lemma  will  be  needed  to  establish  frame  bounds. 


Lemma  4*2.1  Let  f  E  L2(J)  where  J  C  [ti,t2]  25  a  bounded  interval,  t  £  K,  and  q,/3£|, 
Then,  c(«+W(-0/  e  L2(J)  with 

min{e(tl-t)a,e(t2-t)a}||/||  <  ||c(“+W(-0/||  <  max{e(tl-t)a,e(t2-t)Q}||/||  . 


Proof.  That  /  G  Z2(  J)  is  immediately  obvious,  since  for  a  bounded 

interval  J,  G  i2(J)  is  bounded  above  and  below  for  all  t  G  R.  Examining 

||e(a+!/J)(  -t)/||,  we  see 


||e("+*'/J)(  -‘)/||2  =  J  \^a+i^s-^f(x)\3dx 
=  J  (e(-x~i^a)2\f(x)\2  dx  . 


58 


Since  the  exponential  function  is  monotonic  on  R,  we  know  that 


min{e^1~^a,  e^t2^a}2\f(x)\2  <  (e^“)2|/(x)|2  <  max{e^-*>a,  e^-^Q}2\f{x)\2 


for  each  x  E  J.  This  leads  immediately  to 


minlc^1  ^a,e^2  ^or}||/||  <  ||e^or+*/3^'  ^/||  <  maxje^1  ^0,c^3  ^a}||/|| 


□ 

The  following  lemma  will  be  useful  when  combining  the  results  of  Theorem  4.2.4  and 
Theorem  4.3.1,  below,  which  is  done  in  Chapter  VI.  It  is  placed  here,  in  the  context  of 
similar  lemmas,  to  make  it  more  readily  understood. 

Lemma  4.2.2  Let  {<f>j}  be  a  frame  for  L2(I)  with  frame  bounds  c  and  C,  where  I  Cl. 
Then ,  for  every  bounded  interval  J  C  [tm,tn\  C  I  and  for  every  g,/3  E  K,  {e(°r+z/?)(  )^ }  %$ 
a  frame  for  L2{J),  with  frame  bounds  (min{ea*m,  eatn})2c  and  (max{efttm,  ea*n})2C. 

Proof.  Let  /  E  L2(J).  Then  by  Lemma  4.2.1,  £  L2(J).  Since  is  a 

frame  for  X2(/),  it  is  a  frame  for  L2(J)  also,  and  so  we  may  write 

c||e(«-W)/||3  <  ^|(e(“-W)/,^.)|2  <  C,||e(a~W)/||2  , 

3 

where  the  norms  and  inner  products  are  those  of  i2(*0-  Again  by  Lemma  4.2.1,  this  gives 
cmin{ea*m,ea*n}2||/||2  <  ^  |{/,  e(Q+^)()^)|2  <  C max{eatm, ea*n}2||/||2  . 

3 


□ 

The  following  lemma  would  appear  to  be  of  little  practical  use,  since  it  does  not 
apply  to  a  wide  class  of  sequences  {a*}.  However,  it  turns  out  to  be  a  crucial  lemma  when 
proving  Theorem  4.2.4. 

Lemma  4.2.3  Let  {ak}  be  such  that  Y2T=i^k\ak\2  <  oo.  Then 

oo 

<  £2*M2- 


oo 

k  =  l 


59 


Proof.  Let  YlkLi^k\ak\2  =  x2  <  oo.  The  case  where  x  =  0  is  trivial.  Otherwise, 
define  a  new  sequence  b  =  { |6^ | }  by  |M  =  We  know  that  b  £  f2(Z+),  since 


12  _ 


oo  9* 

=  £N2  =  =  i 


fc  =  l 


k=l 


Define  c  =  {2  */2}  G  /2(^+)>  with 


=  ^(2“*/2)2  =  £2“*  =  1 


k  =  1 


Jb  =  l 


Since  6,  c  £  ^2(Z+),  we  know  that  (&,  c)  is  well-defined,  and  is  given  by 

'  2^/2 


<M  =  E(VKIJ2~*/2 

i  00 

=  rEKI- 


kzzl 


However,  we  also  know  that  |(6, c)|  <  ||&||||c||  =  1,  which  gives 

oo 

E  H  ^  21  • 


*=1 


Hence, 

(oo  \  ^  oo 

E>*i  <  x2  =  E2"i^i2  • 

k=l  /  k~ 1 

Since  the  above  argument  also  shows  that  J2T=i  ak  is  absolutely  convergent,  we  have  that 

/oo  \  2  /oo  \  2  oo 

=  (Ead  ^  EKI  ^  E2*M2- 


oo 

Ea* 

i  =  l 


\k  =  l 


\k  =  1 


k=l 


□ 

The  following  theorem,  a  main  result  of  this  chapter,  provides  a  way  to  construct 
frames  for  X2(R)  using  the  decaying  exponential  windows  defined  in  (21)  and  frames  on 
i2(R+).  (Note  that  it  is  not  a  misprint  that  In  =  oo  is  allowed  in  Theorem  4.2.4.) 


60 


In  the  following,  l^rR  — ►  R  is  the  characteristic  function  of  the  set  IcK.  That  is, 


MO 


1  t  ex 

< 

0  otherwise 


Theorem  4.2.4  Fix  0  <  e  <  6  <  oo  and  choose  { tn}n C  R  such  that  tn  <  4+i  for  all 
n  E  Z  and  such  that 


\n-m\e  <  \tn  -  tm\  <  \n  -  m\6 


(22) 


/or  every  n,m  E  Z.  Choose  {/n}n€a  C  such  that  for  every  n,  4+i  —  tn  <  4  <  oo.  For 
each  n}  let  {<f>n,j}jez,  C  X2(^+)  he  a  frame  for  Z2(K+)  with  frame  bounds  c  and  C .  Ze£ 
a,/?  G  R  be  such  that  a  <  —  ^7.  Then  the  set  {l —  tn)}nj€s  25  G 
frame  for  Z2(R)  with  frame  bounds 


Ce2Sa(  1  -  (e25a)Amin  ) 


p25a 


and 


2C(1  -  (2e2e°f)iCmax) 
1  -  2e2e« 


where  Kmin,  defined  by 


F min  —  inf(sUp{/?  .  4+fc  tn  ^  4})  5 

represents  the  minimum  (over  n)  number  of  intervals  [4  7  4+ 1],  k  E  Z;  completely  covered 
by  the  interval  [4,4  +  Jn]  and  where  Kmax ,  defined  by 

Fmax  =  sup  {sup  {A:  I  4+fc  —  1  4  ^  4}  •  n  £  , 

represents  the  maximum  (over  n)  number  of  intervals  [4?4+i]>  A:  G  Z;  completely  or 
partially  covered  by  the  interval  [4,4  +  4)*  Both  Kmin  and  Kmax  can  be  infinite . 

Proof.  To  improve  the  readability  of  the  proof,  the  proof  that  follows  is  done  in 
detail  for  the  simpler  case  where  for  each  n,  /n  =  tn+K  -  4  for  some  fixed  1  <  K  <  oo  and 
where  {<4,j}j€£  =  {(l)m,j}jez  for  all  rn,n  £  Z.  For  this  constraint  on  {4},  we  have  that 


61 


Kmin  =  K  —  Kmax.  The  changes  necessary  for  the  more  general  case  are  tedious,  but  not 
difficult,  and  can  be  done  by  inspection. 

Let  /  6  L2( R).  For  each  m  6  Z,  define  fm  E  TaQ^nuWi])  by 


/m(<) 


I  f(t)  tm<t<tm+ 1 

0  otherwise 


(23) 


Clearly,  /  =  Em  /m  and  since  /m  X  /„  for  all  m  ^  n,  we  have  that  ||/||2  =  ||/m||2 

Therefore,  we  may  write 


nj  m 

=  E  E(/-’1hn,wK](-)e(o+w-t’‘)^(--fn)) 

n,j 

=  E 


(24) 


m 

n+Jf  — 1 


=  E 


n+K—  1 


( n-\-K  —  1 


<  E  E  |(c(“-W-‘-)/m,0,-(.-<n))|)  .  (25) 

n,j  \  m—n  / 

Provided  the  sum  on  the  right-hand  side  is  finite  (which  will  later  be  seen  to  be  true),  we 
may  use  Lemma  4.2.3  in  (25)  to  obtain 

nJ 

n+K-1 

<  EE  2m-"+1|(c(“-W-*-)/m,^(.-tB))|2 

n,j  m=n 

=  E  E  -  f„))|! 

nj  A: =0 

=  EE  2fc+1|(e(“-^-t-)/„,0i(.  -  tn_*))|2 

n,j  k=0 


62 


=  EE2*+1E  |(e(“-W-*-*)/B,^(.-tB_»))|2.  (26) 

n  k= 0  j 


Since  is  a  frame  for  X2(^+)>  we  may  write 


E|(e(“-^)(  -t—)/n,^(.-<n-fc))|2  <  C||e<“-W-*—>/n||2  • 

3 


From  Lemma  4.2.1  and  our  choice  of  a,  we  have  that 


||e(«+,-/JX- -*.-0/n||  <  ||/„||  , 


and  so 


E  |<e(«-W(-*-*)/Bt^.(.  _  *„_*»!>  <  Ce2(‘»-t—)“||/„||2.  (27) 


Returning  to  (26)  and  using  (27),  we  have 


E K/. i|.....«i(-)«t*+wx-*V,(- - (»)>i2  <  EE 2*«c^,-,-*)*ii/n||1 

nj  n  fc=0 

iC-1 

<  C  E  ll/n||2  E  2i+1e2(i"-("-k>" 

n  k— 0 

<  CEl|/n||2E2fc  +  le2e*“’ 

n  k=0 


where,  for  the  last  step,  we  have  used  tn  —  tn_k  >  eft  and  a  <  0.  Noting  that  for  our  choice 
of  a  and  e,  the  inequality  2e2eor  <  1  holds,  we  find 


K-l 


E  2fc+1( 


,2eka 


k=Q 


oo  oo 

=  2  E(2e2£a)i  -  2K+1e2‘*“  E(2e2£a)* 

fc=0  J 

oo 

=  2(1  -  (2e2ea)K)  E(2e2£“)‘ 


k=  0 
At 


Jfc=0 


2(1  -  (2e2fa)*) 
1  -  2e2e“ 


63 


Using  this,  we  may  then  write 


EK/’1* ^U¥a+m'-tn)M--tn))?  < 


2(1  -  ( 2e2ea)K ) 
1  -  2e2ia 


2(1  -  (2e2ea)K) 
1  -  2e2ea 


(28) 


which  establishes  our  upper  bound.  The  lower  bound  is  somewhat  more  straight  forward. 
Beginning  as  before  from  the  left-hand  side  of  (24), 


E  K/.  -  *„))|! 

n,j 

=  EEK1t'..-.«l(')e<"-‘',K  -,-)/.^(--  i„)>|2  • 

n  j 

Since  {4>j}j^z  is  a  frame  for  Z2(l^+)>  we  may  write 

E  K/,  l[.....+d(-)^+‘'K  -'-V,(-  -  <„)>|2  >  • 

n,j  n 

Using  fm  as  defined  in  (23),  this  gives 

n+K-l 

>  cEll  E  £<—S><-'-)/„||2 

71 ,  J  71  777  — ^  Tl 

n+K  —  1 

=  <EE  ||e<“-"»<-->/„,f  , 

Tl  771  — —  Tl 

since  the  functions  e^a~i^'~tn'>  fm  are  orthogonal  to  each  other.  Then,  by  Lemma  4.2.1, 
we  have 


El(/>l[-..<.«](-)e<"+‘,,)(_,-V,('“i„))i2  > 


n+K-l 

E  E  e2<'-+,-'->‘,|| /„ 


=  cEll/-N2  E  e2“-~-">" 

m  n~m  —  K+l 

m 

>  cEll/mll2  E  e2(m+1-n){«  , 

m  n=m  —  /C+l 


64 


where  the  last  step  is  valid  because  a  <  0  and  (22).  Continuing,  we  have 


K-l 


2(fc+l)<5c* 


n>3 


k—0 

K-l 


=  cEll/™fE(^“) 


k  -{- 1 


k= 0 


■EWfrr 


>e2Sa(l-(e26a)K) 


ce26a(l  —  (e2Sa)K)u  ,U2 


1  —  € 


26a 


1  -  e26<x 

ll/ll2 


(29) 


which  gives  our  lower  bound.  Bringing  (28)  and  (29)  together,  we  see 
ce2ia(l  —  (e2Sa)K)  2 


1  —  e 


26a 


w 


n  J 


which  completes  the  proof  that  {1  —  4)}njez  is  indeed  a  frame 

for  L2( R).  □ 


4-3  A  generalization  of  the  frame  operator. 

Presented  in  this  section  is  a  generalization  of  the  concept  of  a  frame  and  frame  oper¬ 
ator.  This  particular  generalization,  the  second  main  result  of  this  chapter,  is  very  useful  in 
practical  applications,  such  as  those  involving  frames  of  the  type  proven  in  Theorem  4.2.4. 

Let  H  be  a  Hilbert  space  and  let  {4>k}  and  {^.}  be  a  frame  and  dual  frame  in  7i. 
The  representation  to  be  developed  is  based  on  considering  the  operation  (fi<j>k)<t>ki  which 
figures  prominently  in  (19),  to  be  a  projection  onto  a  subspace.  Then,  one  may  consider 
the  mapping  4>k  to  be  a  linear  transform  between  subspaces,  giving  an  analogy  to 

(20).  In  this  way,  it  is  possible  to  generalize  frame  elements  to  frame  subspaces. 

This  has  advantages  in  that  it  will  allow  the  use  of  nested  frames.  That  is,  there 
is  no  constraint  on  how  the  orthogonal  projections  onto  the  subspaces  are  computed.  In 
particular,  given  frames  in  the  appropriate  subspaces,  the  projection  onto  the  subspace 
may  be  represented  by  its  frame  expansion,  as  in  Theorem  4.1.4. 


65 


Any  exact  frame  may  be  recast  in  this  more  generalized  framework  to  give  (possibly) 
faster  calculations  of  coefficients  in  the  frame  expansion.  The  coefficients  given  in  this  case 
will  be  identical  to  those  found  by  the  usual  methods.  If  the  frame  is  not  exact,  it  may  still 
be  possible  to  recast  the  frame  such  that  identical  coefficients  are  achieved,  given  careful 
construction  of  the  subspaces. 


Theorem  4.3.1  For  Hilbert  space  Ft,  let  {Vj}  C  Ft  be  subspaces  such  that  Ft  =  Vjt 
where  the  {Vj}  need  not  be  mutually  orthogonal  For  each  j,  let  Cj  G  and  let  Pyr.Fi  — *  Vj 
be  the  orthogonal  projection  operator  onto  Vj.  Suppose  there  exists  0  <  A  <  B  <  oo  and 
such  that  for  each  f  £  Ft, 

AW  <  X>l|fV,/||J  <  •Bll/ll’-  (30) 

j 

Then ,  the  operator  F:H  — ^  H  given  by 

Ff  =  Y.cipvj  (31) 

j 

is  bounded ,  self-adjoint ,  and  invertible  on  its  range.  Define  Ty.:Vj  — ►  Vj  by  Ty.  —  F~x, 
where  Vj  C  Ft  is  the  range  ofTy..  Define  TVj  =  .  Then ,  for  all  f  G  Ft,  we  have 

J  =  I>5w  =  'E'lWtJ  ■ 

j  J 

Proof.  This  proof  begins  by  showing  that  F  is  bounded. 


imi 


sup 

ll/IMMI=i 


sup 

11/11=11*1=1 


sup 

11/11=11*1=1 


sup 

11/11=11*11=1 


m,g)\ 

(Zcipvjf’9) 

j 

ZcApvJ,«) 

j 

E  ((pv1f,s)Pvj  +  (g-  (PvJf,g)Pvjf))) 

j 


(32) 


66 


Noting  that  ifj  -  {Pvif,s)PyJ)  -L  Pv,f,  (32)  becomes 


Ml 


sup 

11/11=11^11=1 


’Z,cj(Pvlf,(PV:f,g)PvJ) 


sup 

11/1=11*11=1 


Y.Ci(Pyif,s)(Pyli,Py,f) 


< 


sup 

il/ll=IUll=i 


sup 

ll/IMMI=i 


F*  °j  <  fyj  f  ■>  9)  w  Pvj  / 1 1 2 

3 


Since  ||/||  =  ||^||  =  1  and  for  each  j .  PVj  is  an  orthogonal  projection,  we  know  that 
\(pvJ,g)\  <  1,  leading  to 


11*11  < 

< 


sup 

11/1=0*11=1 


£<*l|JVIIa 


sup  #||/|| 2  =  B  . 

ll/IMI*ll=i 


Therefore,  F  is  bounded  and  ||jp||  <  B . 

To  show  that  F  is  self-adjoint,  let  f,gEH  and  recall  that  {cj}  C  R. 


(Ff,s)  =  (E'iPyJ’i) 

3 

=  J2cApvjf,g) 

3 

=  J2cj(Lpv]g) 

3 

=  (fiJ2c3Pv=9) 

3 

=  (f,Fg), 


where  I  have  used  the  fact  that  projection  operators  are  self-adjoint.  Therefore,  F  is 
self-adjoint  on  H. 

Next,  since  F  is  linear,  to  show  that  F  is  invertible  on  its  range  it  is  sufficient  to  show 
that  it  is  injective.  So,  assume  that  for  some  f  EH,  that  Ff  —  0.  Using  (30),  elementary 


67 


calculations  give 


(f-Ff)  =  if,J2cJPvJ) 

3 

=  E'APvJ  +  PvtflPvJ) 

3 

=  Eci(Pv,f,Pv,f) 

3 

=  '£c1\\Pvlf\\:‘ 

3 

>  A\\f\\\ 

Now,  using  the  inequality  \{f,Ff)\  <  ||/||||i'7||,  we  get 

41/11  <  \\Ff\\- 


Therefore,  Ff  =  0  implies  /  =  0,  and  so  F  is  injective  and  therefore  invertible  on  its  range. 
Since  F  is  invertible  on  its  range,  we  may  write 

/  =  F-'Ff  =  F-'Y^cjPvJ 

3 

= 

3 

Since  PVj  has  a  range  of  Vj,  using  the  definition  of  Ty.,  we  have 

/  =  'E'iTtAJ  ■ 


That  is,  I  =  CjTy.PVj. 
Next,  write 


I  =  r 


(E'lTtAi)’ 

3 

E'iPtA,)'  ■ 

3 


(33) 


68 


The  form  of  ( Ty.PVj )*  can  be  found  by  solving  {Ty.Pyjf,  g)  =  {/,  (Ty.Pyj  )*g)  for  f,g  E  H. 


(Tvfyf’9)  = 


(T(.Pv,fJ.Pv,  l  Pv±)g) 

{TvjPvjf,  Pvj9)  +  (PvjPvjfi  Pvf-9)  • 


(34) 


Since  the  range  of  Ty.  is  Vj  and  Vj  1  V)1,  we  have  (Ty.Pyj  f,  Py^g)  =  0-  Thus  (34)  becomes 


<■ Ty,PvJ,g ) 


( TvjPvjf )  Pvjd) 

( PVjf,n.Pyj9 )  . 


(35) 


Since  Tyj  =  Tt  ,  we  have 


(Ty.Pyj,  g) 


(PVjf,TViPy.g) 

({Pvj  +  T\/x  —  Pvx)f,  TVjPy^g) 

(( Pvj  +  Pyf-)f->TVjPy.g)  -  ( Pv±f,TVjPy.g ) 
{  f  1  TVj  Pyi  g  )  —(Pyxf,TvjPy.g). 


As  before,  since  the  range  of  TVj  is  Vj,  we  know  that  (Py± /, Ty, Pyj)  =  0.  Therefore, 


{Py,P v,f,9)  =  ( f,TVjPyj9 >. 


That  is,  ( Ty.PVj )*  =  TyjPy..  Using  this  in  (33),  we  have 

/  =  /*  =  Y.^Pv.r 

j 

=  ’YhCi'PviPyj  ■ 


Therefore,  for  all  /  E  Pi,  we  have 


/  =  'L^PV,!  =  Y.C^PyJ  , 


□ 


In  Theorem  4.3.4,  it  will  be  shown  that  the  same  iterative  methods  for  determining 
the  inverse  of  the  operator  F  commonly  used  with  the  ordinary  frame  operator  will  also 


69 


work  here.  This  work  follows  along  the  same  path  taken  in,  e.g.,  [9]  and  [11].  Two 
preliminary  lemmas  will  make  the  proof  of  Theorem  4.3.4  easier. 

The  following  lemma  is  very  useful  in  establishing  that  the  lower  frame  bound  need 
not  be  known  in  order  to  use  the  iterative  method  described  below  in  Theorem  4.3.4.  That 
is,  one  need  only  know  that  a  lower  frame  bound  exists  in  order  to  use  the  iterative  method 
and  not  the  actual  value  of  the  lower  frame  bound. 


Lemma  4.3.2  Let  Ti,  {Vj},  {cj},  F,  A,  and  B  be  as  in  Theorem  f.3.1.  Let  A!  E  (0,5]. 
Define  the  operator  by  R  =  I  -  ji2+bF.  Then  ||i?||  <  1. 

Proof.  We  will  bound  ||_R||  by  noting  that  5,  as  the  sum  of  two  self-adjoint 
operators,  is  self-adjoint,  and  that  for  self-adjoint  operators, 


\\R\\  =  sup  \(RfJ)\  . 

Il/ll=i 


Examining  (5/,/),  we  find  that 


(RfJ)  =  ((/■ 


A'  +  B 


A!  +  B 


A'  +  B 


F)f,f ) 

(Ff,f) 


E^PVii2. 


Inequality  (30)  implies 

W-«  +  B 

leading  directly  to 


2B  . 2  <  (RfJ)  <  ll/ll!  -  ll/ll2 


|^)  ll/ll2  <  (RfJ)  <  (B+b*~/A)  II/IP- 

By  the  restriction  0  <  A  <  B  <  oo  and  our  choice  of  A!  6  (0,  B ],  we  know 


0  <  ['1-4  1  <  1 


~  V£  +  A' 


70 


and 


-1  < 


B  +  A!  -  2A 

FTa' 


<  1 . 


Therefore, 


I  (Rf,f)\  <  max 


(B-A'\  / B  +  A'-2A 
[b  +  A’J’K  B  +  A' 


ll/ll2  <  ll/ll2  • 


Therefore, 


PH  =  sup  \(RfJ)\ 
11/11=1 


< 


max 


f(B-A'\  fB  +  A'  -2A\\ 
{{b  +  A'J’K  B  +  A >  )j 


<  1  . 


(36) 

□ 


Without  much  difficulty,  it  can  be  easily  shown  that  the  value  of  A!  which  minimizes 
the  the  bound  on  ||_ff||  in  (36)  is  A!  —  A,  where  A  is  the  maximum  value  for  which  (30)  holds. 
Since  the  rate  of  convergence  of  the  iterative  method  to  be  presented  in  Theorem  4.3.4 
is  dependent  on  \\R\\,  if  A  can  be  estimated,  this  estimate  should  be  used.  However,  it 
is  sufficient  to  show  that  A  >  0  exists  in  order  to  guarantee  convergence  of  the  iterative 
method  in  Theorem  4.3.4  for  any  choice  of  0  <  A!  <  B. 

The  previous  lemma  leads  directly  to  a  representation  for  F~x. 


Lemma. 4. 3. 3  Let  H,  {Vj},  {cj},  F,  A,  and  B  be  as  in  Theorem  4-3.1  and  let  A!  and  R 
be  as  in  Lemma  4-3.2.  Then  F~1:'H.  — *•  TL  is  given  by 


F-1 


2 

.4'  +  B 


OC 


Proof.  Note  that  — (J  —  R)  =  4'+B  (/  —  (I  -  -aA+b  F))  =  F.  Therefore,  we  may 

write 


F~l 


2 

A'  +  B 


( r-R r1. 


71 


From  Lemma  4.3.2,  we  have  that  ||i?||  <  1.  Therefore,  we  may  expand  (I  —  R)  1  in  a 
Neumann  series  to  get 


F'1 


2 

A!  +  B 


t i-r r1 


2 

Ar+B 


CO 


□ 


We  are  now  ready  to  present  an  iterative  method  for  finding  a  representation  for 

fen. 


Theorem  4.3.4  Let  n  and  F  be  as  in  Theorem  4-3.1  and  let  R  be  as  in  Lemma  4-3.2. 
For  f  £H  and  for  each  N  6  Z+,  define  f  ^  by 


fN 


2 

A'  TB 


E  VFf  . 


n— 0 


Then, 


f  =  lim  fN  . 

N-+OQ 

Proof.  From  Lemma  4.3.3,  we  have  that  F -1  =  A?+B  I2^°=o  Rn-  Using  this,  we 

have 


/  =  F~1Ff 


A1  +  B  ^ 

n— 0 


lim 


Y,RnFf 


N 


N-*co  A'  +  B 
lim  fN  . 

N-+  OO 


J2Rnpf 


n— 0 


□ 

If  it  is  desired  to  explicitly  define  the  spaces  {V}}  specified  in  the  statement  of  The¬ 
orem  4.3.1,  it  is  possible  to  apply  the  operator  F -1  to  a  basis  of  each  space  Vj.  That  is, 
the  action  of  F~lPyj  is  completely  defined  by  its  action  on  a  basis  of  Vj. 


72 


4-4  Summary 

In  this  chapter,  I  developed  a  frame  for  L2(K)  based  on  a  frame  or  frames  for  L2(R+) 
(Theorem  4.2.4).  This  frame  can  be  particularly  useful  in  dealing  with  L2(R)  functions 
where  characteristics  of  the  functions  change  with  respect  to  the  independent  variable. 
Although  this  frame  is  not  peculiar  to  speech,  it  has  properties  that  will  make  it  especially 
useful  in  creating  frames  for  speech,  as  will  be  done  in  Chapter  VI. 

Independent  of  this  frame  for  L2(R),  I  developed  an  operator  akin  to  the  ordinary 
frame  operator  (Theorem  4.3.1).  This  operator,  a  generalization  of  the  ordinary  frame 
operator,  can  be  used  as  a  basis  for  faster  and  more  versatile  algorithms  for  finding  frame 
representations.  This  alternate  representation  can  be  particularly  useful  in  finding  frame 
representations  in  frames  which  are  not  amenable  to  ordinary  frame  representation  cal¬ 
culations.  In  particular,  frames  which  are  not  over-complete  but  which  contain  many 
non-orthogonal  elements  may  be  used  more  effectively  by  being  rearranged  for  more  effi¬ 
cient  use  with  the  more  general  operator.  This  alternate  representation  will  be  used  in  the 
program  described  in  Chapter  VI,  and  was  found  to  be  quite  helpful. 


73 


V.  Representation  in  H2(D) 

Chapter  III  presented  an  extension  of  a  representation  theorem  for  elements  of  Hp (D) . 
That  theorem  proved  that  a  representation  exists,  but  it  gave  no  way  to  find  any  such 
representation.  Producing  such  a  representation  is  the  goal  of  this  chapter. 

This  chapter  is  concerned  with  representations  in  H2( D),  a  Hilbert  space,  and  so 
after  some  preliminaries,  the  more  general  Hardy  spaces,  Hp( D),  will  be  neglected  in  favor 
of  H 2(D).  For  convenience,  in  this  chapter,  the  notation  ||  •  ||  will  refer  to  the  H2  norm  on 
D.  Also,  since  the  work  here  will  be  in  a  Hilbert  space,  the  functional  evaluation  ^[f]  will 
be  replaced  notationally  as  the  inner  product  {/,  ^). 

There  are  two  main  results  in  this  chapter,  Theorems  5.2.7  and  5.2.10,  both  of  which 
establish  frames  for  H2(D).  Theorem  5.2.10  is  the  more  useful  of  the  two,  and  is  based  more 
firmly  than  Theorem  5.2.7  on  the  results  of  Chapter  III.  These  results  will  be  combined 
with  the  frame  developed  in  Chapter  IV  to  create  a  frame  for  speech  in  Chapter  VI. 

Blaschke  products  will  be  defined  in  Section  5.1  and  are  of  great  use  when  working 
with  representations  involving  the  functions  of  the  form  given  in  (17),  which  was  the  form 
used  in  the  representation  of  Chapter  III  Some  of  the  basic  properties  of  Blaschke  products 
will  be  presented. 

Functions  of  the  form  of  (17)  are  rational  functions  with  a  single  pole  in  C  \  D  and 
no  zeros,  and  so  will  be  referred  to  as  simple  pole  functions.  Note  that  throughout  this 
chapter  and  the  chapters  that  follow,  the  form  of  simple  pole  functions  which  will  be  used 
it  given  by 


i>{z) 


(1  -  M2)1/2 

(1  —  az ) 


(37) 


where  a  €  D.  This  form  varies  from  (17)  used  in  Chapter  III  by  a  constant  bounded 
between  1  and  y/2 ,  and  so  this  change  presents  no  difficulties  in  using  the  results  based  on 
(17).  The  impetus  behind  this  change  is  that  for  ^  of  the  form  of  (37),  we  have  ||^||  =  1. 
An  important  characteristic  of  functions  of  the  form  of  (37)  (Lemma  3.6.3)  is  that  for  all 
/  €  fT2(D), 


</,V>)  =  /(«)(1-M2)1/2.  (38) 

Unfortunately,  as  is  shown  in  Section  5.2,  there  is  no  set  of  simple  pole  functions 
that  can  be  a  frame  for  H2{D).  Fortunately,  as  is  also  shown  in  Section  5.2,  there  are  both 


74 


frames  and  orthonormal  bases  for  H2(D)  based  on  Gram-Schmidt  orthonormalization  of 
sets  of  simple  pole  functions.  Section  5.3  gives  an  algorithm  for  finding  approximations  to 
elements  of  H2(V)  using  the  frames  presented  in  Section  5.2. 

5. 1  Blaschke  Products 

For  our  work,  we  will  define  Blaschke  products  to  be  functions  of  the  form 

B(2)  =  5^’  (39) 

where  1  <  n  <  oo,  {ak}  C  D,  and  ^fc=i(l  “  la^l)  <  °°*  This  definition  is  identical  (up  to 
a  constant  of  magnitude  1)  to  that  given  in  [12]. 

The  following  lemma  is  given  without  proof.  It  summarizes  some  of  the  properties 
of  the  Blaschke  products  that  will  be  needed  in  the  proofs  ahead.  The  details  of  the  proof 
can  be  found  in  various  places  in,  e.g.,  [12]. 

Lemma  5.1.1  Let  B  be  a  Blaschke  product.  Then  B  E  Hp( D)  for  each  0  <  p  <  oo, 
\B(z)\  <  1  for  all  z  E  D,  \B(ez6)\  =  1  for  a.e .  0  E  [— 7r,7r)  and  \\B\\P  =  1. 

The  next  two  lemmas  illustrate  some  of  the  useful  properties  of  the  Blaschke  products 
that  will  be  used  in  various  proofs. 

Lemma  5.1.2  Let  B  be  a  Blaschke  product  and g  £  Hp( D)  for  p  >  1.  Then  ||jBfir||p  =  ||#||p. 

Proof.  Let 


g(et9)  =  lim^(re!9)  .  (40) 

By  Theorem  3.2.2,  we  have  that  for  g  defined  in  this  way, 

ll%l|£  =  h  fjB{eiS)g{eie)Y  de  ■ 

However,  from  Lemma  5.1.1,  we  have  that  \B(el6)\  —  1  for  a.e.  9  G  [— 7r,7r).  Therefore, 


75 


11**11?  = 


11*11?  • 


□ 


Lemma  5.1.3  Let  B  be  a  Blaschke  product  and  f,g  6  H2(D).  Then,  ( Bf,Bg )  =  (/,  g). 

Proof.  Allowing  f(ei9)  and  #(e’*)  to  be  defined  by  the  (possibly  non-analytic) 
extensions  of  /  and  g  to  3D  given  above  in  (40),  we  have  that 


(Bf,Bg)  =  i-  f  B(eu)f(e“)B(e‘')g(e»)  M 
=  </,5>* 


since  by  Lemma  5.1.1,  we  have  that  |i?(e*0)|  =  1  for  a.e.  0  6  [— 7r,7r).  □ 

The  following  theorem  is  vital  to  the  work  presented  later.  It  appears  in  [12],  where 
it  is  attributed  to  F.  Riesz.  It  is  presented  here  without  proof. 

Theorem  5.1.4  (F.  Riesz)  Let  f  G  Hp( D),  0  <  p  <  oo,  f  not  identically  zero.  Then  f 
can  be  written  in  the  form 


f  =  Bg, 

where  B  is  a  Blaschke  product  and  g  G  Hp( D)  does  not  vanish. 

5.2  Frames  for  H 2(D) 

The  results  from  Theorem  3.6.4  involved  representations  for  /  G  HP(D )  of  the  form 

/  =  X!  ,  (41) 

n,k 


76 


with  E„  (E£)|A„.,r)1"  <  oo,  where  {A,,*}  C  C  is  a  doubly  indexed  set  with  n  £  Z+ 
and  k  =  1, . . . ,  k(n)  and  k(n )  <  oo  for  each  n,  and  where 


4>n,k(Z) 


(i-  K,*l)1/P 

1  - 


for  some  set  {an> C  D.  Since  I  want  to  represent  functions  in  ^(D)  as  a  sum  (41),  I 
need  to  determine  how  to  find  the  coefficients 

If  the  set  {ipn,k}  were  a  frame,  then  this  problem  would  be  well  understood.  Unfor¬ 
tunately,  as  Theorem  5.2.1  shows,  this  is  never  the  case. 


Theorem  5.2.1  There  does  not  exist  any  set  {ak}  C  D  such  that  the  set  {'ipk}  is  a  frame 
for  H 2(D),  where  ipk  is  defined  by 


i>tU) 


(1  -  l«tl_z)1/g 

1  —  akz 


Proof.  Let  {a*}  C  D  and  assume  ^(1  —  \ak\)  <  00.  By  Lemma  5.1.1,  we  have 
that  the  Blaschke  product  B  defined  by  (39)  is  analytic  in  D  with  ||f?||  =  1.  From  (38), 
we  have 


(B,rpk)  =  (1-1  ak\2Y'2B(ak)  =  0 


for  all  k.  This  implies  that 


£|<S,^>|2  =  0. 

k 

Since  ||£?||  =  1^0  this  implies  that  there  does  not  exist  c  >  0  such  that  c||/||2  < 
l(/>  V’it)|2  f°r  /  G  Therefore,  if  1  —  la^l)  <  oo,  then  {ipk}  is  not  a  frame 

for  H2( D). 

Next,  assume  that  ^fc(l  —  |afc | )  =  oo.  Choose  /  G  H2( D)  defined  by  f(z )  =  1  for  all 
2  €  B>.  For  this  choice  of  /,  ||/||  =  1.  Evaluating  J2k  I(/j  V’*) |2  we  see 

=  eki-ki2)1/2/k)i2 

k  k 

=  EU-KIXi  +  M) 


77 


>  xyi  ~  Ki)  =  00  • 

k 

Since  ||/||  =  1,  this  implies  there  does  not  exist  C  <  00  such  that  J2k  i(/?^fc)|2  <  C||/||2 
for  all  /  6  jET2(D).  Therefore,  if  £^(1  —  |afc|)  =  00,  then  {^}  is  not  a  frame. 

Since  either  —  |a^ | )  <  00  or  J2k(  1  ”  \ak\)  —  °°5  this  implies  that  there  does  not 
exist  a  set  {a*}  C  D  such  that  {^}  is  a  frame  for  H2(D).  □ 

Despite  the  fact  that  a  set  of  simple  poles,  {^},  cannot  be  a  frame  for  H2(D)1  an 
orthonormal  basis  for  jBT2(D)  can  be  created  by  performing  the  Gram-Schmidt  orthonor¬ 
malization  process  on  {^}.  In  [22],  Ninness  and  Gustafsson  prove  the  following  theorem. 

Theorem  5.2.2  Let  {an}£°=1  C  D  be  such  that  an  /  am  for  all  n  /  m  and  define 
,  C  H2( D)  by 


where 


&TI  —  ? 


V»n(*) 

Bn{z) 


(1-KI2)1/2 


(42) 

(43) 


and  Bi  =  1.  The  set  {<t>n}^L  1  is  an  orthonormal  basis  for  H2(I3>)  if  and  only  if 


00 

£(1  ~  l°*i)  =  00  • 

k=0 


Proof.  This  proof  follows  the  one  given  in  [22],  but  I  provide  more  detail.  It 
consists  of  showing  that  the  sequence  { <pn }  is  the  result  of  performing  the  Gram-Schmidt 
orthonormalization  process  on  the  sequence  {ipn}  and  then  showing  the  necessity  and 
sufficiency  of  the  condition  —  |«*|)  =  oo  to  have  a  basis. 

Define  the  result  of  the  Gram-Schmidt  orthonormalization  process  on  {ipk}  to  be 
{4>k}-  Since  ||^>i||  =  1,  the  first  element  of  the  orthonormalized  sequence  is  given  by 


78 


<j>i  =  Bi^i  —  xjji.  It  is  easily  shown  using  (38)  together  with  (42)  and  (43)  that 


*-<*,#,)*  =  ((°i 

=  52(a2)^2^2  • 


Since  \\i>n\\  =  1  and  Bn  is  a  Blaschke  product,  by  Lemma  5.1.1,  we  have  that  ||^n5n||  =  1. 
Hence, 


||^2  —  (^2?^l)^l  ||  —  1 1 -^2 (^2) ^2 -^2  | 


|52(g2)|  ’ 


implying  that  the  next  function  in  the  orthonormal  sequence,  </>2,  is  given  by 

4>2  —  4*2  B 2  • 

Now  assume  that  (f>k  =  ^kBk  for  each  k  =  1, . .  .,n  -  1.  Applying  the  Gram-Schmidt 
process,  we  see 


n  —  1 


i>n(z)-'52(‘ll>n,4>l b)MZ) 


k-l 


n  —  1 


=  ^n{z)  ~  ^n)<l>k(z) 


k  =  1 
n  —  1 


=  ipn(z)  -  ]T(1  -  \an\2fl2(j)k(an)(f)k(z) 

k  =  1 

(i-Kl2)1/2 

(1  -0^2) 

>(1  —  l®*!2)1^2  [tt  (aj  ~  Q») 

j= 1  (1  —  a«a.7  ) 


(1  —  anak) 


k  =  l 


(1  -  k|J),/J  |  W  (%  ~  a.) 


(1  -  atz) 


fJl(i-V) 


=  (i-kl2)1'1 


n  —  1 


_  (1  ~  la*|2) 

f  1  _  Ti  n 


(1  -  anz)  "  (1  -  a„a*)(l  -  afc2:) 


TT  ( toj  Q>n  )  ( Q*j  tin  ) 

/=i  (1  -a7o7)(l  -ajz) 


(1-K|2)1/2 


(1  -  anz)  [rifc=i(1  -  Ofc«)(l  -  fl»ait)] 

-(i-^)g(i-KI2) 


n  —  1 


!!(!  -  Cj2)(l  -  anaj) 


3= 1 


*  =  1 


n  —  1 

_[_  (l  Unttj)(l  UjZ) 
j=fc  +  l 


fc  — 1 

“  ttn)(Gj  “  z) 

i= 1 


(44) 


79 


For  the  next  part  of  the  calculation,  combine  the  first  term  of  the  difference  with  the  first 
term  of  the  sum,  giving 


n  — 1 


V’nO)  -  &)&(*) 


k- 1 


(l-kn|2)1/2 


(1  &nz')  n*=l(l  Q'n&k) 


(1  -  a\z)(l  —  anai) 


n  —  1 


fl(l  ajZ)(^  anaj ) 

i= 2 


-(l-anz)(l-  laj2) 


n  —  1 


j^J(l  anaj)(l  &jz) 


j= 2 


n-l 


-(l-an^Ed-KI2) 


k=2 


n  — 1 


(i-KI2)1/2 


(1  -  a„z)  [n*=i  (!  -  “fc^X1  -  On  a*) 
“ (1  —  On2)  £(1  —  l^l2) 


II  (l-OnOjXl-a^) 

j  =  £  +  l 

I 

(of  - a^X®i  “  *) 


At  —  1 


JJ(aj  an){aj  z) 


3  =  1 


n  — 1 


JJ  (1  andj) 

i=2 


Ar  =  2 


n  —  1 


n  (1  “  an^')(l  ~  djz) 

j=k+ 1 


Ac  —  1 


II(ai  ~an)(a3  ~  Z) 

3  =  1 


Note  that  the  quantity  (a!  —  an)(di  —  z)  is  a  factor  of  each  term  in  the  sum.  Factoring  it 
out  of  the  difference,  we  then  have 


n—  1 


Mz)  -  n  7  (t>k)4>k(z) 


k= 1 


(1  ~  K|2)1/2(gj  -  OnXoi  ~  z) 


(1  -  anz)  [n*=x(l  -  akz){  1  -  anak) 
-(1-^)£(1-K|2) 


n  — 1 


na-o^x1-®^) 


J=  2 


Jfc=2 


n-l 


(1  anaj)(^  a3 Z) 

j=k+ 1 


At  —  1 


II(ai 

i=2 


Carefully  examining  the  difference,  it  can  be  seen  that  it  is  again  in  the  form  seen  in  (44). 
Since  n  is  finite,  we  may  iteratively  continue  this  regrouping,  to  get 


n  —  1 


i’n(Z)~Yl^n’<f>k)MZ) 


(1  -  |a„|2)1/2  [nUK  -  ®n)(Ojfe  -  z) 


k  =  l 


(1  -  anz)  [rifc=^(1  -  a**)(l  -  anak)\ 

~n  —  1  j 


TT  (afc  ~  ®n) 

lk=i  (1  _  «r®fc)J  d  -  ®rd 


(1  -  knl2)1^2  ("  (ot  —  z) 


Bn (on )V?n d)-^nd)  • 


80 


Note  that  Bn(an )  ±  0  so  long  as  a„  ak  for  all  k  =  1, . . n  —  1.  Therefore,  the  sequence 
=  {V’n-Bn}  is  the  Gram-Schmidt  orthonormalization  of  the  sequence  {iftn}. 

To  show  the  necessity  of  the  condition  that  XiLo(l  -  lGn|)  =  oo,  note  that  by 
Theorem  5.1.4,  any  /  6  H 2 (D>) ,  /  not  identically  zero,  may  be  expressed  as  /  =  gB , 
where  g  E  H2(D)  is  non-vanishing  and  B  is  a  Blaschke  product.  If  we  have  chosen  {a„} 
such  that  £2£10(1  —  |a„|)  <  00,  by  Lemma  5.1.1,  the  Blaschke  product  defined  by  {an}, 
B(z)  =  riiLi  (^rifv)>  is  an  element  of  H2(D)  with  ||i?||  =  1.  However,  using  (38),  for  each 
k  =  1, 2, . . .,  we  have  that 


=  (1  -  \ak\2)1,2B(ak)  =  0, 

which  shows  that  B  is  orthogonal  to  the  space  spanned  by  the  However,  {cf>k}  is  the 

orthonormalization  of  the  set  {^},  so  that  this  also  implies  that  B  is  orthogonal  to  the 
space  spanned  by  the  {(j>k).  Hence,  {<j)k}  is  not  complete.  Therefore,  it  is  necessary  that 
]P£Li(l  —  \an\)  =  00  in  order  for  {0*}  to  be  an  orthonormal  basis  for  H2 (D)- 

The  sufficiency  that  X^i(l  —  \an\)  =  00  in  order  for  {<f>k}  to  be  complete  is  seen  by 
assuming  that  {4>k}  is  not  complete.  This  implies  the  existence  of  a  function  /  E  H2 (O),  / 

not  identically  zero,  such  that  {f,<f>k)  =  0  for  all  k  =  1,2, _  Since  each  ^  is  in  the  span 

of  {<£n)n= 1,  using  (38)  this  likewise  implies  that  (/, ipk)  =  (1  —  \ak\2)ll2f(ak)  =  0  for  all 
k  =  1,2,....  From  Theorem  5.1.4,  we  know  that  for  this  to  be  true,  /  must  be  of  the  form 

/  =  Bg, 

where  the  Blaschke  product  B  is  defined  by  B{z )  =  an(^  9  £  ^(D)  (g  not 

necessarily  non- vanishing  in  this  case).  By  the  definition  of  a  Blaschke  product,  we  have 
i(1  “  | an  1 )  <  00.  Hence,  for  every  non-basis  sequence  {</>*},  we  have  that  YfX. Li(l  — 
|an|)  <  oo*  Therefore,  1  “  lGn|)  =  °°  is  sufficient  for  {<£*}  to  be  complete.  □ 

Corollary  5.2.3  For  every  finite  set  {ak}f=1  C  D  such  that  aj  /  ak  for  all  j  ^  k  and 
every  Blaschke  product  B,  the  set  {^kB}k=l  is  linearly  independent. 

Proof.  From  the  proof  of  Theorem  5.2.2  above,  we  have  that  the  set  {'tpk}  is  linearly 
independent.  To  see  the  the  set  is  linearly  independent,  assume  that 

K 

Y^CkipkB  =  0 

*=1 


81 


for  some  {ck)k=\  C  C.  Since  B  is  a  Blaschke  product,  it  is  0  only  at  a  countable  number 
of  points.  So  for  a.e.  choice  of  z  G  D,  we  have  B(z)  ^  0.  Evaluating  the  function  at  2,  we 
see  that 


K  K 

^2cktpk(z)B(z)  =  B(z)  ^  Ck'tpki2)  =  0, 


*=1 


Jfe  =  l 


which  yields 


K 

EckMz)  =  °- 

k  =  l 

Since  this  is  true  for  almost  every  choice  of  2  G  D,  this  implies 

K 

'Y^Cki’k  =  0. 

k  =  l 

However,  the  set  k]k=x  Is  linearly  independent,  which  implies  that  c*  =  0  for  1  <  k  <  Ii  . 
Therefore,  the  set  {^j kBj^-i  is  linearly  independent  also.  □ 


Corollary  5.2.4  Let  {an}  C  O,  {V>n}  C  H2(D)  and  {Bn}  C  H2(D )  be  as  defined  in 
Theorem  5.2.2  above ,  and  let  1  <  j  <  k  <  m  <  n  <  00.  Then  ipnBm  ± 

Proof.  Rearrange  the  sequence  {a2}  into  a  new  sequence  {bn}  so  that  the  desired 
functions  $ nBm  and  ifrkBj  are  the  results  of  the  Gram-Schmidt  orthonormalization  process 
on  the  new  sequence.  That  is,  create  the  sequence  {bn}  C  D  from  the  old  sequence  {an} 
according  to  {bn}  =  {a,-}*=1  +  {a*}  +  {ajf "/+1  +  {a;}£*+1  +  {<*»}  +  {ai}r=m+i  +  K}£n+i, 
where  here,  the  +  operator  indicates  sequence  concatenation.  Then,  this  corollary  follows 
immediately  from  the  above  theorem.  □ 

Theorem  5.2.5  Let  {an)Jfc}  be  such  that  ^^(1  -  |a„f*|)  =  00.  Define  Bn  G  H2(D)  by 

where  Bi  =  1.  Then,  for  Fn  =  span({a„iJ;5n}*^),  we  have  that  Fn  JL  Fm  for  all  n  ^  m 
and  that  H2(P)  =  ®~=i  Fn- 


82 


Proof.  From  Corollary  5.2.4  above,  we  have  immediately  for  each  m  ^  n  and 
each  1  <  km  <  k(m)  and  1  <  kn  <  k(n),  that  i)„tknBn  X  xfmkmBm.  Therefore,  the 
sets  {rfmtkBm}kk^  and  {tfn^Bn}1^}  span  orthogonal  subspaces.  That  is,  Fm  X  Fn  for  all 
m  ^  n. 

Next,  note  that  for  our  choice  of  {anii},  by  Theorem  5.2.2,  the  span  of  the  set  {i>n,k} 
is  Since  each  ipUi k  is  in  the  space  ®"=1  Fn,  we  have  that  H2(IF)  =  ®^°=1  Fn.  □ 


Theorem  5.2.6  Let  the  set  {ak}%=1  C  D  be  such  that  ak  ^  aj  for  all  k  ^  j .  Let  B  6  /^(D) 
be  a  Blaschke  product  with  B(ak)  ^  0  for  each  ak.  Define  the  set  {^k}k=i  C  H2(D)  by 


Mz) 


(i  - H2)1/2 

(1  -  akz) 


Then,  the  set  {i>kB}k=l  forms  a  frame  for  a  subspace  of  H2 (D),  and  its  dual  frame, 
{fpkB}k=1,  is  given  by 


AB(z)  =  rffk(z)B(z)  » 

where  the  Blaschke  product  Bk  is  defined  by 


Bk(z) 


TT  (gj  ~  Z) 

i=7,J *k  C1  "  5 7Z)  ' 


Proof.  By  Corollary  5.2.3,  we  have  that  the  set  { ^kB }£=l  is  linearly  independent. 
Since  it  is  a  finite  set,  this  implies  that  it  is  an  exact  frame  in  some  subspace  of  H2(P>), 
specifically,  the  subspace  spanned  by  {' ipkB}£=1 . 

Using  (38)  and  the  result  of  Theorem  4.1.8  and  Lemma  5.1.3,  and  noting  that  rf kBkB 
is  the  result  of  performing  the  Gram-Schmidt  orthonormalization  process  on  the  set 
with  element  ^ kB  being  the  last  one  to  be  processed,  we  have 

-  =  *hm 

(ifkB,i>kBBk) 

_  ifkBJh 

{ifkyi’kBk) 

ipkBBk 

Ma*)Bk(ak)(  1  -  I2)1/2 


83 


□ 


=  i>kB 


Bu 

Bk(dk) 


The  following  theorem  gives  a  construction  of  a  frame  for  H2 (D) .  It  is  one  of  the 
main  results  of  this  chapter. 

Theorem  5.2.7  Let  the  set  {anjfc}  be  defined  such  that  Theorem  3.6.1  applies  to  the  set 
{i'n.k}  where 


fp n,k(Z )  = 


(1  -  K*|2)1/2 


1  &n,kZ 

Additionally,  assume  that  there  exists  0  <  m  <  M  <  00  such  that  for  each  n, 

( k{n )  \ 

mUn\\  <  (I>n.*|2)  ^  Mll^ll 

where  cf)n  =  ^n^n^Bn  and  where  Bn  E  H2(D)  is  defined  recursively  by 

k(n ) 


A-  V7* )  /  \ 

*~w  =  wn(^), 


with  Bi(z)  ==  1.  Tften  the  set  {i’n}kBn}  is  a  frame  for  H2 (D)  with  frame  bounds  and 
and  with  dual  frame  |  B  B^ak  where 


k(n ) 


«.,»(*)  =  n 


i=i» 


1  anjz 


Proof.  For  each  n,  define  Fn  =  span{^ni*i?n}j^i .  From  Theorem  5.2.6,  we  have 
that  the  set  is  an  exact  frame  for  Fn,  with  dual  frame  ^afc  • 

Next,  let  fn  E  Pn.  Since  {^n|jfc-Bn}*=i  is  a  frame  for  Pn,  we  may  express  /n  as 

fn  =  El^fn^n,kBn)i;n,kBn^^y 

Choose  aligned  with  {(/„,  i’n.kBn)}^  such  that  J2k=i  l^„,*|2  =  1.  Defin¬ 


ing  4>n  E  Fn  by  4>n  =  Ylk=i  A n,ki>n,kB„,  we  then  have  \\(j>n\\  < 


«•  Usin§ 


84 


\(fm<f>n)\  <  ||/n||||0n||  <  and  the  fact  that  for  exact  frames,  the  frame  and  its  dual  are 
biorthonormal,  we  have 


\(fnAn)  |  = 


I  k(n)  k(n)  \ 

(  y2{fn,^n,kBn)lpn,kB„— - ~r—  7 »  X]  ^n,k'^n,k'Bn 

\  *=i  Bn<k{an<k)  k,=1  y 


A:(n)  fc(n) 

^  ^  ]  {/n  ?  ^rijk-^n  ^n^k-^n 

k  =  1  fc'=l 

fc(n) 

^  ^  (/n  ?  ^Pn,k-^n  ) 

&  =  1 


#n,Jfc 


■5n,Jfc(ttn,fc) 


1/2 


?  '$n)k,-B\ 


n) 


This  leads  us  to 


/k(n)  \  1/2 

l(/n,<MI  =  \  T/\(fn^n,kBn)n  < 


ll/nll 


m 


(45) 


Now  choose  <f>n  6  jP„,  <f>n  =  J2k=i  K,k’>Pn,kBn,  such  that  <f>n  is  aligned  with  fn  and 
’nil  =  1  (i-e.,  <i>n  =  Using  \{fn,<j>n)\  =  Il/n||||0»||  =  l|/n||  and  (E^IV*!2)  '  < 


M||^>„||  =  M,  we  have 

ll/n||  _  \(fnAn)  | 


M 


M 


M 

1 

M 


j  *{n)  n  k(n)  \ 

(  '52(fn,'<l>n,kBn)'ll>ntkBn— - - -,  Y  K,k‘i>n,k’Bn  ) 

\  k= i  Bnik(anik)  k,=1  / 


k(n) 

^  ^  (fn  5  ) ^n,k 

k  =  l 


< 


1/2 


(46) 


85 


Combining  (45)  and  (46),  we  see  that  for  each  n  and  any  fn  6  Fny  we  have 


ll/nl12  <  EKfni’nMF  <  ll/”F 


M2 


k=i 


(47) 


From  Theorem  5.2.5,  we  have  that  Fn  L  Fm  for  each  n  ^  m  and  that  i72(D)  = 
0“=1  Fn.  Therefore,  any  /  6  H2(D)  may  be  written  as 


/  =  £/„, 

n  =  1 

where  /n  =  PFn/  is  the  orthogonal  projection  of  /  onto  the  subspace  Fn.  By  the  orthog¬ 
onality  of  the  subspaces  Fn,  we  may  write  ||/||2  =  1  H/n||2*  Returning  to  (47),  this 

gives 


ll/ll2  =  y  \\fnl 

M2  ^  M2 

n  =  1 


k(n ) 


<  <  E 


n  =  l  fc  =  l 


n=l 


Since  for  each  n  and  fc,  we  have  {f^n,kBn)  =  (/n,  ^Pn,kBn),  we  may  then  write 


oo  &(n) 

<  HZ]  K/»V’n,Jfc^n)|2  < 

n=l fc  =  l 


establishing  our  frame  bounds. 


□ 


The  following  theorem  shows  that  points  {anfc}  chosen  according  to  Theorem  3.1.4 
are  candidates  from  which  to  create  an  orthonormal  basis  for  as  in  Theorem  5.2.2. 

That  is,  E„Eh"i)(l-|«n,fc|)  =  oo. 

Theorem  5.2.8  Let  {aUi satisfy  the  conditions  of  Theorem  3.1.4  with  6,  r„  and  Mn  as 
defined  in  that  theorem .  Then  J2k=i  (1  —  \an,k\)  =  oo. 

Proof.  The  method  of  proof  will  be  to  use  the  separation  conditions  from  the 
statement  of  Theorem  3.1.4  to  bound  the  desired  sum.  This  will  be  done  by  using  the 
value  of  the  Euclidean  radius  of  the  pseudo-hyperbolic  ball,  K(a:  6). 


86 


Fix  n  £  Z.  By  the  construction  of  the  set  {ipn,k},  we  can  bound  the  sum  ^^(1  - 
|«n,r|)  above  and  below  according  to 

*(") 

(1  -  Mn)k(n)  <  ]T(1  -  |a„,jfc|)  <  (1  -  i±)k(n)  . 

k  —  l 


To  establish  a  lower  bound  on  fc(rc),  note  that  by  Lemma  3.3.2,  the  radius  r  of  the  ball 
Ii(rn,8 )  is  given  by  r  =  t •  This  radius  is  the  largest  possible  for  balls  K(an}k,8) 

about  points  {an> k}k=i  corresponding  to  {^nfk}k= Since  we  are  looking  for  a  lower  bound 
for  fc(n),  we  can  determine  the  number  of  balls  necessary  to  cover  the  set  {rnetB  :  6  E 
[— 7r,7r)}.  This  number  will  be  approximately  which  gives 


k(n)  > 
> 


2^(1  -  82Tj?) 

26(1  -  rj) 

irr^l  -  82rn2)  1 

6(1  +  Jn)  (1  “  Tn)  ’ 


Since 


TT rn.il -d2rn_2) 

5(l+rn) 


26 


is  bounded  away  from  0  and  is  finite,  we  know  that  for  suffi¬ 


ciently  large  n,  we  may  write 


k(n)  > 


B 

(TTJ 


for  some  constant  B  >  0.  This  gives  for  sufficiently  large  n  that 


5(1  -  Mn) 
(1-In) 


ib(n) 

<  Ed-K*D- 


Jc  =  1 


From  Lemma  3.4.7,  we  have  that  liminf^oo  1  Mjl  >  >  0,  which  implies  for  suffi- 

£n_  * 

ciently  large  n,  we  may  write 


0  < 


4 


< 


fc(n) 


E(1  -  K.*l) 


for  every  n  >  N.  Therefore, 


87 


00  . 


Jfe(n) 

EB'-ivi)  = 

n  k~  1 


□ 


A  very  desirable  result  would  be  to  show  that,  for  points  chosen  according 

to  Theorem  3.1.4,  the  set  {'il>n,kBn}  is  a  frame  for  Unfortunately,  this  has  not 

yet  been  shown,  although  I  suspect  it  is  true.  However,  a  similar  result  has  been  shown 
(Theorem  5.2.10  below)  that  is  suitable  for  the  practical  applications  I  have  in  mind.  The 
following  lemma  is  necessary  to  prove  this  result. 

Lemma  5.2.9  Let  the  sets  Sn  C  D  satisfy  Condition  A,  and  for  each  n,  let  {an,fc}it=i  C  Sn 
be  chosen  such  that  p(anj-,  aU)k)  >  e  >  0  for  each  j  k  for  some  fixed  e.  Then  there  exists 
6  >  0  such  that 


for  each  n  and  k,  where 


Bn,k(2) 


n 


f  Qriik'Z 


Proof.  From  Lemma  3.4.10,  we  have  for  N  sufficiently  large,  A 9n  =  e(l  —  Mn“)  < 
1 6n  j  -  0n}k |,  if  n  >  N  and  j  /  fc,  where  6n}k  =  argaUjfc.  It  can  be  shown  easily  that 
p(a,b)  >  p(c|fj-,C||y)  for  every  a,  6  G  D  and  every  0  <  c  <  min{|a|,  |6|}.  Therefore, 


k(n ) 


(48) 


88 


Using  the  identity 


a  —  b 


1  -  ab 


(i  —  IqI2)(i  -  H2) 

|1  -  ab\2 


in  (48),  we  get 


k(n) 


lBnAan'k){2  -  JLl1  |i4v^,o,2)- 


(49) 


Since  both  sides  of  (49)  are  strictly  positive,  elementary  computations  produce 


> 

fc(n) 

E 

In  (l 

fc(n) 

V 

CO 

= 

-  E 

E 

fc'  =  l,fc'^fc  j  =  l 

fc(n) 

CO 

> 

-  E 

E 

(l-Ia2) 


2\2 


(1-In2)2 


kt=ltk*?kj  = 
k(n) 

-  E  - 

k'=zl,k'^k  1 
Ar(n) 

E 

k(n ) 


(l~£n2f 


|1— r^2e,^n,fc“dfi,fc/^|: 


(1-r^2)2 


|  i  — rrL2e,(fln,fc-^n>fc  /  )J2 

(1  —  In2)2 


1-IlL2)2 


(1  -  rjf 


2rn2(l  -  COS (0„,fc  -  (9n,jfc-)) 

fc(n)/2 


>  -2E 


2\2 


(I-IZL2) 


Tj  2rv2(l  -  cos(fcA0n))  ’ 


(50) 


where  the  last  step  uses  the  inequality  1 6n  k  —  0n  k>\  <  \k  —  £:'|A0„.  Note  that,  for  any 
A0„/2  <  4>  <  7r, 

1  <  2  I'*  1 

l-COS<£  -  A0n  J<t>-ASn/2  1  -  cos# 


89 


Therefore 


fc(n)/2 


k- 1 


cos(kA9n) 


< 


J-[ 

A0n  J* 


2ir-A0„/2 


W A«„/2  1  COS  6 


d,0 


< 


A0n 

4 

Afn 

16 


(cot(A0n/4)  —  cot(7r  -  A6n/4)) 
cot(A0n/4) 


(a eny  • 


Incorporating  (51)  into  (50),  we  have 


2  In  \Bnik(an}k)\  >  - 


16  (1-rvT 
(A0„)2  2r„2 

16  (1  —  £n2)2 

e2(l  -  M^)2  2In2 

(1-Ia)2  8(l  +  r^ 

(1-Mn)2  £2r„2(l  +  M„)2 


>  - 


8 


1  -  rn 


I-MJ  e2rn 


C1T  2  • 


(51) 


(52) 


1  —  rn 


Since  limsupn^00  <  1_c  and  — >  1,  we  know  there  exists  some  constant  Ce  <  oo 

such  that 


l-r„V  8 


1  -  MnJ  e2rn 


,  2  <  Ce 

c~  r  * 


(53) 


for  all  n.  Incorporating  (53)  into  (52),  we  get 


2  In  \Bnik(an,k)\  >  ~Ce 


for  all  n  and  k.  This  implies  that 


where  S  =  e  c^2. 


\Bn,k{C'n,k)\  >  8  >  0  . 


□ 


90 


Theorem  5.2.10  Let  the  sets  S„  CD  satisfy  Condition  A,  and  for  each  n,  let  C 

Sn  be  chosen  so  that  there  exists  an  e  >  0  such  that  p(bnj,bn  k)  >  e  >  0  for  each  j  ^  k.  Fix 
0  <  M  <  oo.  Define  the  set  {an:k}  as  a  reindexing  of  the  set  {bn  k}  such  that  for  each  n, 
{an,k}  C  {&n',fc}  for  some  n'  and  such  that  k(n )  =  #{an^}  <  M .  Then  the  set  {ipn,kBn} 
is  a  frame  for  where 


i>n,k(z) 


(l~|«n,tl2)1/2 
1  -aTf;z 


(54) 


and 


B„+i{z ) 


fc(n) 

BnU)  n 


k~l 


an,k  ~  z 

1  -  d^fkZ  ’ 


(55) 


with  Bi(z)  =  1. 

Proof.  The  upper  frame  bounds  are  easy  to  find.  First  note  that,  by  Theorem  5.2.5, 
the  sets  {'ipn,kBn}  and  {i}n’ykBn>}  span  orthogonal  subspaces  for  every  n  /  nf  and  we  may 
write 


/  =  Efn 


where,  for  each  n,  fn  is  in  the  span  of  {ifn,kBn}.  The  orthogonality  of  the  subspaces  gives 


Thus,  for  each  n, 


ll/ll1  = 


El( 


k=l 


El  (/n,^n)|a 


k  =  1 


) 


where  the  last  equality  uses  the  result  of  Lemma  5.1.3  and  where  the  division  by  Bn  is 
justified  by  noting  that  /„  is  in  the  span  of  {i-’n.kBn}.  However,  the  set  {&„*}  satisfies 
the  hypothesis  of  Theorem  3.1.4,  which  means  that  Theorem  3.6.1  applies.  Since  {an,*}  C 


91 


{bn>fk}  for  some  n',  this  implies  that 


fc(n)  fc(n)  r 

t-i  t  — i 


< 


/» 

B 


=  C||/„|| 


2 


for  some  fixed  C  <  oo.  Therefore, 


oo  k(n) 

EE  !</>*., 


n=l fc  =  l 


<  E  CN/nll2  =  C||/||!. 

n  =  l 


This  establishes  the  upper  frame  bound. 

To  establish  the  lower  frame  bound,  note  that  for  each  n,  fn  can  be  represented  by 


fn 


k(n ) 

J2(f,lpn,kBn)lpntk 


k  =  1 


Bn,kBn 
Bn,k  ) 


J 


where,  for  each  n  and  Bn> k  is  defined  by 


n 


Qn.ifc'  ~  Z 

1  _ 


Choose  such  that  Ylk=}ln,ki>n,kBn  is  aligned  with  fn  with  ||  J2k=i ln,k^n,kBn\\  =  1. 

Then 


which  implies 


/  k(n) 

||/n|P  =  \(fn,^2ln,ki’n,kB„) 

\  k=  1 

/k(n) 

—  (  ^  tftn^Bn  )7n,fe 

\Ar=l 

A(n)  \  AO) 

(  Xj  K/i^n,Jb-Bn>|S  1  (X]|7n,i|2)  , 


< 


&(n) 

(ESM  5 


(56) 


92 


Next  we  need  to  establish  an  upper  bound,  independent  of  n,  for  the  quantity  Yll=i  |7n,i|2- 
For  each  k' ,  we  have 


k(n ) 


Bn  Bn 


k(n)  p  p 

|(X!  1n,ki>n,kBn,  'tpn,k'-~n-Y^k‘ . ^)[  =  |7n,it'|  <  ||  ^  ln,k^n,kBn  || ||  VVi.fc'  R  "  /  J 


fc=l 


Jb  =  l 


< 


Therefore, 


fc(n) 


fc  =  l 


k(n) 


Eiv*i!  <  E 

< 


\Bn,k(an,k)\2 
k(n ) 

min*  |5niib(anjJt)|2 
M 

min* 


(57) 


From  Lemma  5.2.9  above,  we  know  that  for  each  k  and  n  and  B„k  defined  by 


k(n)  L  L 

S  /  \  .  TT  ^n,k‘ 

>n,k\an,k)  ~  ^  7 - " 

*/=l,Ar/^*  1  “  Vn^Vnik’ 


that 


k(n) 

^n,jb(«n,Jfc)|  =  }{ 

kl=\}kl^k 


bn,  k  bn 


k> 


1  bn.kbn 


k' 


>  6  >  0 


for  some  fixed  <5  independent  of  n  and  k .  Since  each  {an^}  C  {&„',*},  we  have  that 
|#n, *:(««, *)l  >  l#n', *'(&»',*')!  >  ^  for  SOme  n'>  k> •  Fr°m  (57), 


*(*») 


EI"mI2  S 


fc=l 


M 


Combining  this  with  (56),  we  get 


*£.  < 


11/nlP 


fc(n) 

<  J2\(f^n,kBn)\^ 

*  =  1 


93 


which  gives  the  lower  frame  bound, 


M 


k{n ) 


<  EE 


n—\  k= 1 


□ 


5.3  Projections  into  the  H2( D)  frame 

In  this  section,  a  method  of  projecting  into  the  frame  described  in  the  previous  section 
is  given.  One  advantage  of  this  algorithm  is  that  it  does  not  require  the  points  {an]k}  to 
be  fully  determined  a  priori ,  which  allows  for  adaptive  frame  selections. 

In  that  sense,  it  is  equivalent  to  the  orthogonal  matching  pursuit  algorithm  described 
in  [24].  However,  the  properties  of  the  frame  elements  used  here  lead  to  a  nicer  represen¬ 
tation  in  the  sense  that  the  exact  forms  resulting  from  the  orthonormalization  in  the 
orthogonal  matching  pursuit  are  known. 

The  next  lemma  is  needed  to  show  that  a  division  used  in  the  algorithm  is  well- 
defined. 

Lemma  5.3.1  Let  C  D  be  chosen  such  that  ak  /  aj  for  all  k  /  j}  and  let 

be  defined  by 

(i  -  M2 

(1  -  akz ) 

Let  S  be  the  space  spanned  by  {f>k}k=i  ani ^  ^  be  the  orthogonal  complement  to  S  in 

H2(D).  Then,  for  any  f  £  SL ,  f  is  of  the  form 

f  =  Bg, 


where 


B(z) 


TT  (flfc  ~  f) 

M  (1  -  of z) 


and  g  £  H2(U>). 


94 


Proof.  Since  f  £  S1  and  5X  _L  5,  we  know  that  (/,  tpk)  =  0  for  all  k  =  1, . .  ,,K. 
That  is, 

if^k)  =  (1  -  l«Jfc|2)1/2/(«fc)  =  0 

for  all  ak.  Using  Theorem  5.1.4,  we  get 


/  =  Bg 


for  some  g  6  /T2(D>). 


□ 


Lemma  5.3.2  Let  {anik}  C  D  be  chosen  such  that  Theorem  5.2.7  or  Theorem  5.2.10 
applies  and  define  {fin,k}  C  H2 (O)  according  to 


4>n,k{z) 


(1  -  |«n,fcl2)1/2 

(l 


For  each  n  E  Z+,  let  Vn  be  the  subspace  of  H2 (D)  spanned  by  the  set  Similarly, 

for  each  n  E  Z+;  let  Fn  be  the  subspace  of  H2(D)  spanned  by  the  set  ,  where 


Bn+ i(z) 


k(n ) 
Bn(z)  n 


k  =  l 


(finjc  ~  Z) 
(1  -a^kz) 


•) 


and  Bi(z)  =  1.  If  f  E  B2(D)  25  chosen  such  that  f  ±  Fm  for  each  m  =  —  1;  then 

Ppnf  is  given  by 


PfJ  =  BnPVnf- 


Proof.  Let  /  6  .^(D)  be  such  that  /  _L  Fm  for  each  m  =  l,...,n  —  1.  By 
Lemma  5.3.1  above,  this  implies  that  /  is  of  the  form 


/  =  gBn 


95 


for  some  g  E  H2 (D).  From  Theorem  5.2.6,  we  know  that  {i>n,kBn}k=i  ls  a  frame  in  Fn 
with  dual  frame  {'ipn,kBn  B~  7[a~~ }  where  BUi k  is  defined  by 


*(n) 

*.,»«  =  n 


j=l,j£k 


( an,j  ~  *) 
(l  ) 


Theorem  4.1.4  gives  that 


fc(n) 


iV./  = 


B 

EaiAMnn  X 

Bn^ky^n^k  ) 


k  =  1 


fc(n) 


Bn,k 


—  Bn  ^  '  (ffPn?  V?n,fcPn)V?n,fc  D  /  \ 

Jb  =  l  Bn'k \&n}k  J 

fc(n) 

—  ~^n  ^  'j(d i '$n,k)'$n,k~ 

k  =  l 


Bnjc 
Bn,k(®,n,k') 


which  (using  (20)  and  again  Theorem  5.2.6)  is  of  the  form 


Pf„/  =  BnPVng 

PfJ  =  BnPVn -T. 

Bn 


□ 


Lemma  5.3.3  Let  {an  k},  {'ipn, *}>  {Th},  and  {Vn}  be  as  in  Lemma  5.3.2.  For  f  E  ^(D), 
define  fn  by 


fn  =  fn—1  Bn  —  \PVn_lr~~~  (58) 

-On- 1 

where  fi  =  /.  Then,  for  each  n  >  1  and  each  m  =  1, . . .,  n  —  1,  u?e  have  /„  JL  Pm. 

Proof.  (By  induction)  By  Lemma  5.3.2  above,  we  have  that  PFl/  =  PiPy,^-, 
which  implies  that  for  /2  as  defined,  /2  _L  Px.  Assume  that  for  some  n,  we  have  /„  _L  Fm 
for  all  m  =  1, . . . ,  n  —  1.  Then,  Pf„/„  =  BnPVnjf^  ,  which  implies  that  for  /n+1  as  defined, 
fn+i  -L  Pn.  However,  since  by  Theorem  5.2.5  we  know  Fn  ±  Fm  for  all  n  /  m,  we  have 
/n+i  J-  Pm  for  each  m  =  1, . . . ,  n.  □ 


96 


Theorem  5.3.4  Let  {an  k},  {ipn,k},  {-FnL  {K}>  and  {/„}  be  as  in  Lemma  5.3.3  above. 
For  any  f  E  H2(D), 


f 


"  A 

Um  ^BkPVk^-. 

4  Bk 


k  =  l 


Proof.  From  the  definition  of  /„  in  (58),  we  find  that  for  each  n. 


f 


n- 1 


fn  +  Yl  BkPvk-Jj-  • 


k=\ 


From  Lemma  5.3.3,  we  know  that  /„  J.  Fm  for  all  m  =  1, . . .,  n  —  1.  Also,  since  PFrafm  = 
BmPVm we  may  write 


n  —  1 

/  =  /n  +  ^Fk/fc  • 

k= 1 

Since  the  right-hand  summation  is  a  sum  of  projections  onto  the  orthogonal  subspaces 
{^fc}*=1,  we  have  /„  _L  £k=i  Ppkfk  •  That  is,  /  can  be  represented  as  the  sum  of  orthogonal 
elements.  Since  such  a  decomposition  is  unique,  and  since  /„  1  Fm  for  m  =  1, . . . ,  n  —  1, 
we  have 


and 


A  fk 

PsJ  =  EB*Pv>lt 

k=i  Bk 


fn+l  =  Ps±f  , 


where  Sn  =  ©£=1  Fk  and  5^-  is  the  orthogonal  complement  to  Sn  in  H2(B).  Since 
Sr=i  pn  =  H2{ D),  we  have  for  every  /  G  H2( D),  that  limn_00  /„  =  0.  Therefore, 


/ 


lim  j^EkPVk^-  . 


k=l 


□ 


97 


5.4  Summary 

In  this  chapter,  I  presented  two  theorems  (Theorems  5.2.7  and  5.2.10)  which  stated 
results  about  frames  for  H2(D).  These  frames  are  useful  because  they  are  created  from 
frame  elements  which  sample  the  values  of  element  of  H2(D)  as  specific  points.  This  sam¬ 
pling  property  will  be  especially  useful  in  practical  applications  where  values  of  functions 
may  be  known  only  at  specific  sample  points.  These  frames  will  be  used  in  the  development 
of  frames  tailored  for  speech  in  Chapter  VI,  where  isometric  isomorphisms  between  H2(D) 
and  L2(R+)  will  be  employed  to  use  these  results  as  building  blocks  for  the  L2(R)  frame 
developed  in  Chapter  IV. 

I  also  developed  a  method  by  which  representations  in  these  H2(D)  frames  can  be 
found,  without  having  all  of  the  frame  elements  determined  a  priori.  This  method  is  a 
specific  implementation  of  an  orthogonal  matching  pursuit  algorithm,  where  the  properties 
of  the  frame  elements  are  used  to  get  exact  forms  for  the  representation  coefficients.  These 
exact  forms  were  used  when  possible  in  the  computer  program  described  in  Chapter  VI  to 
give  faster  calculations  and  more  precise  values. 


98 


VI.  Application  to  speech  representation 

The  theory  presented  in  Chapters  III,  IV,  and  V  can  be  used  together  in  the  area 
of  speech  processing  and  representation.  The  goal  of  this  chapter  is  to  demonstrate  an 
application  of  the  theory  to  speech  processors  to  make  the  connection. 

Since  this  chapter  is  concerned  with  connecting  several  theorems  together  into  an 
application,  the  main  result  of  this  chapter  is  not  a  theorem  or  theorems,  but  rather  the 
sum  total  of  the  illustrated  connections.  This  includes  a  frame  tailored  to  speech  processing 
and  the  description  of  an  application  program  which  finds  approximations  to  speech  based 
on  this  frame. 

This  chapter  is  organized  into  three  sections  -  one  dealing  with  theoretical  considera¬ 
tions,  the  second  with  practical  considerations,  and  the  last  describing  the  implementation 
of  the  computer  program.  Section  6.1  formulates  a  frame  for  speech  based  on  the  work  in 
the  previous  chapters.  An  estimate  is  given  for  the  norm  of  the  frame-like  operator,  for 
use  when  projecting  into  the  frame.  Section  6.2  is  concerned  with  the  practical  application 
aspects,  such  as  the  issue  of  having  available  only  discretely  sampled  speech  instead  of 
continuous  speech.  Section  6.3  describes  a  computer  program  which  uses  the  results  of  the 
other  two  sections.  Specific  attention  is  given  to  design  considerations  which  required  the 
program  to  vary  from  the  theory. 

6.1  A  frame  for  speech 

In  this  section,  connections  are  established  between  the  theory  of  the  previous  chap¬ 
ters  and  the  properties  of  speech  to  create  a  frame  tailored  to  speech.  First,  the  frame  to 
be  created  will  be  described,  which  will  make  clearer  the  necessary  mathematical  justifi¬ 
cation  to  follow.  Second,  the  necessary  proofs  will  be  presented  to  justify  the  creation  of 
the  frame. 

6.1.1  Description  of  the  speech  frame.  As  discussed  in  Chapter  II,  speech  can 
be  considered  a  slowly  time- varying  function.  For  harmonic  speech,  the  speech  signal  can 
be  approximated  well  by  the  response  of  a  slowly  time- varying  system  to  a  sequence  of 
impulse  functions,  where  the  impulses  are  aligned  with  the  glottal  pulses  in  the  speech. 
The  decay  of  the  vocal  response  is  very  rapid,  so  that  the  approximate  start  of  the  response 
for  each  impulse  function  (glottal  pulse)  is  readily  apparent. 


99 


This  would  appear  to  correspond  well  to  a  frame  based  on  Theorem  4.2.4  with  the 
frames  used  at  each  time  shift  being  chosen  to  well  represent  the  rapid  decay  of  the  impulse 
response.  As  shown  below  in  Lemma  6.2.4,  the  basis  functions  of  the  frames  defined  in 
Theorems  5.2.7  and  5.2.10  can  be  transformed  into  decaying  exponentials,  which  give  a 
good  representation  of  both  the  harmonic  nature  of  harmonic  speech  and  of  the  decay  of 
the  impulse  response.  Due  to  the  rapid  decay  of  the  impulse  response,  it  intuitively  makes 
sense  to  limit  the  length  of  the  basis  function  (given  by  In  for  each  n )  so  that  the  number 
of  intervals  overlapping  with  interval  [t„,tn  +  I„)  for  each  n  is  small. 

To  define  this  frame,  we  will  need  to  do  several  things.  First,  we  will  need  to  establish 
an  isometric  isomorphism  between  H2( D)  and  L2(R+),  so  that  we  may  use  the  frame  of 
Theorem  5.2.10  in  the  construction  of  the  frame  of  Theorem  4.2.4.  In  order  to  use  the 
iterative  method  given  in  Theorem  4.3.4  to  find  representations  in  this  frame,  it  will  be 
necessary  to  have  a  reasonable  estimate  for  the  upper  bound  B\  Theorem  6.1.3  solves  this 
problem  for  the  frame-like  construct  to  be  used.  Next,  we  will  need  to  define  the  sets  Sn 
used  in  Theorem  3.1.4  and  show  that  they  satisfy  the  necessary  conditions. 

6.1.2  Isometry  between  H2(D)  and  L2(R+).  The  frames  in  Chapter  V  are  based 
on  the  H2(B>)  representation  given  in  Chapter  III.  To  use  these  results  with  Theorem  4.2.4 
we  must  connect  the  spaces  H2(D)  and  T2(K+)  in  some  way.  In  this  section,  we  will 
demonstrate  an  isometric  isomorphism  between  them. 

Used  as  an  intermediate  space  in  this  transform  will  be  the  Hardy  space  H2(C+), 
where  C+  =  {z  €  C  :  Re(z)  >  0}.  While  there  is  more  than  one  way  to  define  this 
space  [12],  we  will  use  the  following: 

H2( C+)  =  {/ analytic  in  : 

/CO  pOO 

\f(x  +  iy)\2  dy  =  sup  /  \f(x  +  iy)\2  dy  <  oo} 

•OO  X>0  J- CO 


with 


/OO  y»00 

I  f{x  +  iy)\2dy  =  sup/  \f(x  +  iy)\2dy. 

-oo  r>0  7-00 

Note  that  it  is  not  true  that  the  supremum  “occurs”  as  x  -*  0+  for  all  analytic  functions 
with  a  bounded  supremum  on  C+ .  Therefore,  the  restriction  that  the  supremum  occur  as 
x  0+  must  remain  in  the  above  definition. 


100 


As  shown  in  Theorem  3.2.2  for  functions  in  II 2 (D) ,  a  function  /  6  H2(C^ )  can  be 
extended  to  the  imaginary  axis,  where  the  extension  is  given  by  f(iy)  =  limx^0+  f{x  +  iy)- 
The  resulting  function  is  in  L2(iR)  with  ||/||l2  —  ||/||tf2(C+)- 

An  isometric  isomorphism  between  H2{<Cr )  and  H2(D )  is  given  by  S:H2(C¥)  — >• 
H2( D),  defined  by 


and  inverted  by 


S/0) 


V?/  (iff) 

1  +  2 


> 


feH2(  c*), 


s- 7(2) 


/  6  H2( D)  . 


(59) 


(60) 


The  Laplace  transform,  C:L2(R+)  — +  H2(C+)  provides  an  isometric  isomorphism 
between  L2(R+)  and  H2(C f )  when  defined  by 

Cf(s)  =  -j=JQ  dt  .  (61) 

This  definition  varies  by  a  constant  from  the  usual  one.  Its  inverse  is  defined  by 


£-7(0 


(62) 


where  C  is  a  Bromwich  path  with  real  part  greater  than  that  of  all  of  the  singularities  of 
/  (i.e.,  to  the  right  of  all  of  the  singularities  of  /). 

Using  the  Laplace  transform  in  (61)  and  the  transform  S  defined  in  (59),  we  may 
define  an  isometric  isomorphism  T:L2( R+)  — »  iZo(D)  by 


T  =  SC, 


(63) 


with  T  1  =  £  1 S  1 .  This  transform  will  provide  the  connection  between  L2(R+)  and 
H2(I})  necessary  to  combine  the  results  of  Chapters  III  and  V  with  the  results  in  Chap¬ 
ter  IV. 

Given  Theorem  6.1.1  below,  we  know  that  the  transform  T~1:H2(P)  —*■  L2(R)  will 
take  frames  of  the  type  described  in  Chapter  V  into  an  equivalent  frame  in  L2(R+).  Since 
we  have  already  assumed  that  speech  is  an  element  of  L2(R+) ,  one  might  ask:  “Why  not 


101 


simply  use  such  a  frame  to  find  representations  of  speech?”  The  answer  is  that  speech 
has  additional  properties  which  may  not  be  well  represented  by  such  a  frame.  That  is, 
while  a  sample  of  speech  could  be  represented  by  its  frame  expansion  in  any  frame,  an 
approximation  to  a  desired  degree  of  accuracy  may  require  more  frame  elements  to  be 
kept  in  some  frames  than  in  others.  This  is  what  motivates  use  of  the  frame  described  in 
Theorem  4.2.4. 

6.1.3  Frame  for  speech.  Having  illustrated  the  isometric  isomorphism  T  be¬ 
tween  the  spaces  Z2(R+)  and  H2(P ),  we  are  ready  to  illustrate  a  frame  tailored  to  speech 
processing.  First,  we  will  need  the  following,  rather  trivial,  but  useful,  theorem. 

Theorem  6. IT  Let  H  and  Ttf  be  separable,  isometric,  Hilbert  spaces  and  let  T:H  —tTU 
be  an  isometric  isomorphism  between  them.  Let  {'ifk}  be  a  frame  in  Ti  with  frame  bounds 
A  and  B.  Then  {T'lfk}  is  a  frame  in  Ttf  with  frame  bounds  A  and  B. 

Proof.  Since  {'ipk}  is  a  frame  in  TL ,  we  may  write 

AM’  <  £!</.*> I2  £  BW 

k 

A\\TfW 2  <  y]|(T/,r^)|2  <  5||T/||2 

A; 

where  we  have  used  that  ||/||  =  ||T/||  and  (f,g)  =  (Tf,Tg)  for  all  f,g  E  H.  Since  any 
g  EW  can  be  written  as  g  —  TT~lg  and  T~lg  E  Ti ,  we  have  that 

AUrr-^H2  <  ^|(TT-V,T^)|2  <  BWTT-'gW2, 

k 

which  leads  immediately  to 

A|M!  <  EK <  biijII2. 

k 


□ 

We  will  now  combine  the  major  results  of  the  previous  chapters  to  create  a  frame 
tailored  to  speech  representation.  Having  shown  an  isometric  isomorphism  between  L2(R'f) 
and  i?2(0),  we  may  combine  the  results  of  Theorem  5.2.10,  which  provided  a  frame  for 
if2(D),  and  Theorem  4.2.4,  which  provided  a  composite  frame  for  L2( K)  based  on  a  frame 


102 


or  frames  for  X2(R+).  Assume  a  frame  for  if2(0)  of  the  type  defined  in  Theorem  5.2.10 
(to  be  defined  further  in  Section  6.1.5)  and  denote  its  representation  in  L2{ R+)  under  the 
isomorphism  given  in  (63)  as  { <fik }. 

Theorem  4.2.4  provides  a  frame  that  is  a  set  of  translated  and  windowed  frame 
elements  from  a  frame  or  set  of  frames.  Using  the  frame  {4>k}  as  the  basis  for  this  frame, 
we  have  a  frame  {<f>jtk}^  where  the  index  j  represents  the  j th  translation. 

Theorem  4.3.1  requires  a  set  of  subspaces  {Vj}  for  its  use.  Defining  the  subspaces  Vj 
according  to 


Vj  =  span {(f>jjk}k  , 

we  are  poised  to  find  representations  in  the  frame  {<f>j,k}- 

The  next  step  is  to  examine  how  representations  in  this  composite  frame  may  be 
found.  This  will  be  done  by  using  Theorem  4.3.1,  which  gives  a  generalization  of  the 
frame  operator  based  on  subspaces;  Theorem  4.1.4,  which  gives  a  representation  of  projec¬ 
tions  onto  subspaces;  and  Theorem  4.3.4,  which  gives  an  iterative  solution  for  the  desired 
representation  in  the  generalized  frame. 

To  do  this,  we  must  first  define  the  projection  operators  PVj.  Given  that  the  set 
{<t>j,k}k  is  a  frame  for  the  subspace  Vj ,  Theorem  4.1.4  allows  us  to  define  the  projection 
operator  PVj  by  the  frame  expansion.  That  is,  we  define  Py.  by 

PvJ  = 

k 

where  {<f>jtk}k  is  the  dual  frame  to  {<f>jtk}k  in  the  subspace  Vj. 

Given  a  representation  for  the  projection  operators  PVj,  Theorem  4.3.4  provides  an 
iterative  solution  for  a  representation  of  elements  /£  W,  where  7 i  —  Yj  Vj  and  the  sub¬ 
spaces  Vj  are  not  necessarily  orthogonal.  Approximations  to  /  can  be  found  by  truncating 
the  representation  at  some  finite  N . 

6.I.4  Estimates  for  the  bounds  A  and  B .  As  shown  in  Theorem  4.3.4,  the  values 
of  the  true  frame  bounds  A  and  B  are  not  necessary  in  order  for  the  iterative  method  of 
inverting  the  operator  F  (given  in  Theorem  4.3.4)  to  converge.  For  example,  any  B'  where 
B  <  B'  <  00  will  work  in  place  of  B  and  any  A!  where  0  <  A!  <  Bl  will  work  in  place 


103 


of  A .  However,  faster  convergence  can  be  achieved  by  using  good  estimates  A '  and  Bf  for 
bounds  A  and  B ,  provided  0<A/<i4<J9<j5/<oo  can  be  shown  to  be  true. 

The  following  lemma  will  be  useful  in  the  proof  of  Theorem  6.1.3,  which  determines  a 
good  estimate  of  B.  It  determines  an  upper  bound  on  the  number  of  intervals  [tn  Jn+In)  C 
R  that  can  contain  any  point  <  6  R,  given  constraints  on  the  magnitude  of  the  {In}  and 
on  the  spacing  of  the  points  {tn}- 

Lemma  6.1.2  Let  e,  8,  {/n},  and  {/„}  be  as  in  Theorem  4-2-4-  Assume  that  there  exists 
0  <  M  <  oo  such  that  In  <  M  for  each  n  E  Z.  Define  X:  R  — ►  Z+  according  to 

X(t)  =  E1!,.., .+/.)(<) 

n 

where  1  j  is  the  characteristic  function  over  the  interval  I.  Then  X(t)  <  for  all  t  ER, 
where  [•]  represents  the  least  integer  upper  bound . 

Proof.  From  the  conditions  of  Theorem  4.2.4,  we  have  ke  <  tn+k  ~  tn  for  every 
k  E  Z+  and  for  all  n.  Choose  k  =  |"^].  Then  tn+k  ~  tn  >  e  \™]  >  M  >  In-  This  implies 
that  [tn,tn  +  In)  and  [tn+k,tn+k  +  In+k)  are  disjoint.  That  is,  any  point  t  6  R  can  be 
contained  in  at  most  intervals,  and  so  X(t)  <  \^f]  for  every  t  6  R.  □ 

The  following  theorem  will  show  that  the  quantity  [~1  in  the  above  lemma  can  be 
used  to  determine  an  upper  bound  on  the  quantity  B  in  Theorem  4.3.1. 

Theorem  6.1.3  Let  e)  {/n};  and  {<f>n}k}  be  as  described  in  Theorem  4-2-4-  Bor  each  j, 
define  Vj  =  span*.^^}.  Assume  that  for  0  <  M  <  oo,  that  In  <  M  for  each  n.  Let 
{cj}  C  and  assume  that  Cj  <  C  <  oo  for  each  j .  Define  the  operator  F  :  L2(R)  —>  L2(R) 
as  in  Theorem  4-3-1 ,  by 


Ff  =  T,‘iFy,f- 

i 

Then,  for  each  f  G  £2(K), 


EciWpvJW2  <  CMf||/||2, 


where  Mt  =  [^1  • 


104 


Proof.  First,  we  know  by  Lemma  6.1.2,  that  there  are  at  most  Me  intervals 
[f„,fn+/n)  intervals  which  contain  any  point  f  G  R.  This  gives  that  for  each  m  =  1, . . . ,  Me, 
the  elements  of  the  set  {Vm+fcA^litez  are  orthogonal  subspaces,  since  the  corresponding 
intervals  on  which  they  are  defined  are  pairwise  disjoint.  Denote  Wm  =  @kVm+kM^  we 
have  that  for  each  /gh,  that 


\\Pwmf\\2 


ll  /II2 

k 

Ell/V„+„„./ll2. 

k 


Using  this,  we  find  that 


E'/ll/Vll2  < 


cEii/vn2 

3 

cEEWP^JW2 

m~  1  k 
Mc 

cE  II E  /IIs 

m— 1  k£7L 
M< 

c  X]  ll^m/||2 

m= 1 
Me 

v  E  ii/ii2 

m= 1 

cm£||/||2  . 


□ 


6.1.5  Frames  from  H2(p>).  Given  the  frame  described  in  words  in  Section  6.1.3, 
it  is  still  necessary  to  define  the  frames  for  H2 (D)  to  be  used  in  its  construction.  These 
will  be  formed  in  accordance  with  Theorem  3.1.4  with  the  sets  Sn  defined  by 

n  _  f  —  ^ n  T  'iy')  >  ^  1 

5n  -  r GC  :  z _ y - yma*} 

~  (*^n  T  1  .  \ 

+  (*B+iyma.y/  ^  ^ 

where  xn  0+  and  0  <  ymax  <  00  • 


u 


z  ec+  :  z  =  e 


ie 


1-  T  ^2/max) 


1  4"  (^n  T  iVm  ax) 


,  |0|  >  arg 


105 


It  is  necessary  to  show  that  these  curves  indeed  satisfy  the  constraints  of  Theo¬ 
rem  3.1.4.  The  following  lemma  will  be  of  use. 


Lemma  6.1.4  Fix  x  >  0  and  Ay  6  R.  Then,  for  every  y  £  R, 

((l  ~  (g+jy)\  /l-(x  +  z(y+ Ay))\\  _  ffl  -  x\  /l  -  (x  +  i_Ay)\\ 

p  V V 1  +  (x  +  iy)J  ’  \1  +  (x  +  i(y  +  Ay))J )  P  \U  +  x/  ’  \1  +  (x  +  iAy))  J 

Ay 

2x  -  iAy 


Proof.  By  the  definition  of  p, 

//!  -  (x  +  iy)\  /T-(x  +  i(y  +  Ay))\\ 

P  \ \1  +  (x  +  iy)J  ’  VI  +  (x  +  i(y  +  Ay))// 

(  l-(x+iy)\  _  /  l-(s+t(y+Ay))\ 

_  \l+(a?+gy)/  \l+(ar+*(y+Ay))/ 

1  _  /  l-(g+zy)\  /  l-(a?-i(y+Ay))\ 

\  l  +  0*f  iy)  /  \  l+(a:-’«(y+Ai/))/ 

(1  -  (x  +  n/))(l  +  (x  +  i{y  +  Ay)))  -  (1  -  (x  +  ?(y  +  Ay)))(l  +  (x  +  zy)) 

(1  +  (x  +  iy))(l  +  (x  -  i(y  +  Ay)))  -  (1  -  (x  -  z(y  +  Ay)))(l  -  (x  +  iy)) 

Ay 

2x  —  iAy 

off}— A  f1~  (.x  +  iAv)\\ 

P\\l  +  xJ,\l  +  (x  +  iAy)JJ 


□ 


The  following  theorem  will  show  that  the  sets  Sn  defined  in  (64)  are  suitable  for  use 
with  Theorem  3.1.4. 

Theorem  6.1.5  The  sets  Sn,  defined  in  (64),  satisfy  Condition  A  with 


Mn  —  rn  y1 

lim  sup  - — = = —  <  —  ^ —  <  1  . 


n-00  1  —  Mnrn  2  +  uL 


Proof.  Given  the  restrictions  of  Condition  A,  by  inspection  of  (64)  it  can  be  seen 


that 


1  -  xn 
1  +  xn 


(65) 


106 


and 


Mn  = 


1  («E  “1“  ^2/rnar) 


(66) 


1 1  -f-  (x  -|-  iymax )  I 

Also  by  inspection,  it  can  be  seen  that  x„  — *•  0  implies  rn  — »  1. 

To  show  that  rn-^T,  note  that  r„  <  rn(0)  for  all  0  G  [— re,  it).  Therefore,  since  — *■  1 

and  r„(0)  <  1  for  all  n  and  0  G  [ — 7r,  7r),  we  have 


To  see  that  lim  sup 


Mn 


(66)  to  get 


n— oo  i_M„ 


<  TT 


-,  we  use  the  values  of  Mn  and  r„  in  (65)  and 


Mn 


1  Mnrn 


l-(*n+*2/max) 

-1 

(  l-r„  ^ 

1 +(Xn+iymax) 

^  1+^n  ) 

- 

\  1  -(xn+iym 

«*)  1 

(\-*n 

\  1  4"(^  n  '¥‘i>y  max  )  J 

\l+*n 

(l  T  ) 1 1  (%rl  d"  ^max) |  (1  -^n)!!  T  (x n  iymax) \ 
(l  d"  2?n)|l  d"  {%n  d"  iymax)  I  (1  2;n)|l  (*^n  d"  iymax)  \ 

(1  -*l) 


\A  -  2xl  ±  ^VLx  +  *h  +  S;  +  % 

2  d-  2x2  d-  y2max 

sfl  +  2x2  +  2ylMt  +  x£  +  2 xlylZ  +  % 


4 

max 


< 


4 

mar 


2  +  2*>+i^( 
2  +  2x2+^ 


2^^  d-  yl 


2 

max 


2  +  2x2  +  J,: 


2 

max 


Therefore, 


Mn-rn  2x2n+y2max 

lim  sup - — ~  <  lim  - iL- — 

n—*co  1  -  Mnrn  n--°0  2  +  2x2  _|_  y 

2 

ymax 


2 

mar 


2  +  2/; 


2 

mar 


<  1  . 


To  show  that  7^  ^  Ao  as  n  00,  first  note  that  we  need  only  be  concerned  with 


1  M  nrn 


the  region  of  the  set  given  by  ,  0  <  y  <  ymax ,  since  5n  is  symmetric  and  since 

the  region  of  the  set  outside  that  given  by  — ymax  <  y  <  2/mar  is  simply  a  segment  of  a  circle 
centered  on  the  origin,  for  which  is  identically  zero.  Note  that  for  the  remainder 

of  the  proof,  the  notation  (0  denotes  ‘  This  *s  done  for  convenience  in 


handling  complicated  arguments  to  the  function  • 


107 


Solving  for  ye  such  that  ye  >  0  and  1+jrtSS))^  =  e’  we  have  from 

Lemma  6.1.4  that 


€  = 


Vj. 

2x  —  iye 


which  gives 


Vt 


2xe 


Defining  A yn  =  ye  —  ^=J=?,  we  may  determine  the  values  of  rn  and  Mn  evaluated  at 
various  points.  By  the  nature  of  the  sets  S„,  for  A yn  <  y  <  ymax, 


r 


71 


1  ~  (gn  + 

1  +  (^n  +  %y)J  ) 


l  -  (xn  +Jjy  -  A y„)) 

1  +  {xn  +  i(y  -  A yn)) 


(67) 


and  for  o  <  y  <  A yn, 


r„ 


n-(xn  +  iy) 
V 1  +  (*„  +  iy) 


\  - 

1  xn 

)  - 

l  xn 

Likewise,  for  0  <  y  <  ymax  -  A yn , 


Mn 


l-(g„  +  tg)\\ 

1  +  (x„  +  iy)J ) 


1  -  (xn  +  i(y_+  Ayn)) 
1  +  (x„  +  i{y  +  Ayn)) 


and  for  ymax  -  A yn  <y<  ymax , 


Mn 


(  1  (xn  4" 

\  1  -(-  ( xn  iym  ax)  /  / 


1  (#n  4“ 

1  4"  4"  ^1/mor) 


(68) 


Examining  the  nature  of  the  sets  Sn  in  (64),  we  note  that  for  ymax  —  A yn  <  y  <  ymax ? 
we  have 


Tn 

(z,Tc(1~(Xn  +  iyA')  - 

M-n  T n 

Tar  ^  (^n  ^(^maa;  ^Vn ))  "\  \ 

.1  -  Mnrn_ 

V  §V1  +  {xn  +  iy))j 

1  -  Mnrn . 

V  ^  \1  4"  (#n  4-  ^'(ymaar  ~  Aj/n))/  / 

and  for  0  <  y  <  A yn  we  have 


1  -  Mnrn_ 


l-(xn  +  iy)\\ 
1  4-  (^n  4-  iy)J  ) 


< 


'  Mw  -  rn  1  /  /I  -  (a?w  +  iAyn)\\ 
.1  -  Mnrn\  VS  U  +  (xn  +  iAyn)JJ 


108 


Ma  ~r, 


at  values  in 


we  need  only  consider  the  case  of  A yn  <  y  <  ymax  —  A yn.  Evaluating 
this  range,  using  (67)  and  (68),  we  see  that 


1  —  M  n  Tn 


Mn  Tn 


Ll  -  Mnrn 


arg 


1  ~  {Xn  +  iy) 
1  +  (*n  +  iy) 


l-(tfn-H(y+Ayn)) 

- 

l-(jrn+z(2/-A?/n)) 

l+On-H(2/+Ayn)) 

l+0„+z(y-Af/*)) 

l-(rn+z(y-Ayn)) 

l+(^n+%  +  Ayn)) 

l+^n+afy-Ay*)) 

< 


(vTT -  xl  +  (y  +  Ayn)2)2  +  4 x2(y  +  A ?/n)2 

-Vi1  -x2n  +  (y-  Ayn)2)2  +  4x2n(y  -  A?/n)2)  /(2(1  +  x\  +  y2  +  (A?/n)2))) 
■vA1  -*l  +  (y  +  Ayn)2)2  +  4ar^  +  4xl(y  +  A yn)2  -  \/(l  -  a;2  +  {y  -  Ayn)2)2 


2(1  +  x2n  +  y2  +  (A  ynf) 


x2n  +  2yAyn 


1  +  %l  +  y2  +  (A  yn)2 
n  ~h  2ymaJA?/n 
2  .  4xn2/ma2.6 

*n  + 


—  -|- 


\/l  -  c2 

^Vmax  € 


Since 


lim  x 

n— +oo 


L 

n  \  'L'n 


+ 


4  y„ 


VT^" 


=  0, 


this  gives  that  4 


~rn 


■  Af  n  rn 


as  n  — »  oo. 


□ 


O  Using  the  frame 

At  this  point,  it  is  necessary  to  look  at  the  requirements  to  actually  use  the  frame 
designed  in  Section  6.1.3.  One  of  the  main  issues  is  how  to  work  with  sampled  data  when 
all  of  the  theory  dealt  with  functions  defined  on  a  continuum.  Other  issues  include  finding 
the  dual  frame  coefficients  necessary  to  define  the  projection  operators,  Pyjy  required  for 
representation  in  this  frame.  Also,  the  points  *  to  be  used  from  the  sets  Sn  must  be 
discussed. 


109 


6.2.1  Function  represented  by  the  sampled  data.  All  of  the  theory  presented  so 
far  is  appropriate  for  functions  defined  almost  everywhere  on  a  continuum.  However,  the 
speech  we  wish  to  work  with  is  known  only  through  sampled  values.  To  use  the  theory, 
therefore,  we  must  first  determine  what  function  we  wish  the  sampled  data  to  represent. 
Since  the  Laplace  transform  also  plays  an  important  part  in  the  transform  between  X2(K+) 
and  H2(P)',  the  chosen  form  should  also  be  easy  to  transform  analytically. 

Step  functions  will  be  used  here.  In  particular,  the  speech  represented  by  the  sampled 
data  will  be  assumed  to  be  a  step  function  with  the  sampled  values  being  the  height  of 
evenly  spaced  steps.  Clearly,  this  is  not  what  speech  really  is.  However,  this  representation 
(as  will  be  seen)  is  very  easy  to  handle  analytically,  making  it  desirable  in  this  context. 

The  following  theorem  gives  the  Laplace  transform  of  a  function  of  this  form  at 
certain  convenient  sample  points.  Use  is  made  of  the  Discrete  Fourier  Transform,  the 
definition  of  which  follows. 

Definition  6,2.1  Fix  At  £  R+  and  N  G  2Z+  and  define  Au  ==  Let  x  =  {#m}™=o  C 
C.  The  Discrete  Fourier  Transform,  DFT :12{N)  — ►  £2(^)9  of  x  is  given  by 

N- 1 

DFT[x](fc)  =  Y.  Xme~ikA“mAt  .  (69) 

m— 0 

Theorem  6.2.2  Fix  At,  a  G  K+,  N  G  2Z+.  Let  f  G  X2(K+)  be  given  by 

N-l 

/(<)  =  £  /(mA<)![  +  1)A  t)(0 

m— 0 


and  W  G  X2(K+)  by 


W(t )  =  e  atl[0,NAt)(t)  ■ 


(70) 


Then,  for  A u>  =  k  G  ,  —1,0, 1, . . .,  and  o  G  K,  we  have 


£[W  f](a  +  ikAu)  = 


v^27r  (a  +  o  +  ikAuj ) 


DFT  [{f(mAt)e-^mAt}^=l}  ( k ) 


where  DFT  is  as  defined  in  (69). 


110 


Proof.  First,  considering  the  transform  of  a  single,  windowed  step,  we  have 


l  r°° 

D\e  l[m  Ai,(m+l)At)](«s)  —  /— —  /  ^  l[mAt,(m4-l)Ai]  (0^  ^ 

V  2x  Jo 

^  r(m+l)At 

l7T  Jm, 


\p2/K  JmAt 

1 


e~is+a)tdt 


ie' 


v^2 k(s  +  a) 

(i  _e-(.+«)*)c-f<+aWAt 

-v/27r(^  +  a) 


-(5+a)mAt  _  -($+a)(m+l)At 


) 


This  gives 


1 1  _  p-(s+a)At\  W-l 

£[W7JM  = 

V2»(s  +  a)  ^ 

Letting  s  =  a  +  ikAcj ,  we  have 

^  g  —  (a+cr+ifcAa;)Af  ^  N~~l 


jC[Wf]((T  +  ikAu)  = 


Y  f(mAt)e-(a+,T)mAte-ikA“mAt 


\/2 ^(a  +  a  +  ikAa;)  ^ ' 

/ 1  _  .-(a+(T+ifcAw)An 


□ 

Note  that  in  the  Theorem  6.2.2,  we  have  represented  the  Laplace  transform  at  certain 
sample  points  in  the  complex  plane  in  terms  of  a  DFT  of  the  function’s  sampled  values 
scaled  by  decaying  exponentials.  Despite  the  use  of  the  DFT  and  the  sampled  data,  this 
expression  yields  exact  values  of  the  Laplace  transform  of  a  function  of  the  assumed  form 
at  the  sample  points  a  +  iAujk  for  k  =  —  y, . . . ,  y. 

6.2.2  Dual  frame  to  a  windowed  frame.  We  are  given  a  frame,  {'ipk}^=n  and  are 
working  with  the  set  of  windowed  frame  elements  {<l>k}k=i  =  l7  also  a  frame,  where 

W  is  given  by  (70).  We  wish  to  know  the  dual  {f)k}^=1  to  this  new  frame.  Unfortunately, 
the  dual  frame  of  a  windowed  frame  is  not  (necessarily)  the  same  as  the  windowed  dual 
frame,  although  in  many  cases,  it  may  be  close.  The  following  theorem  will  be  useful  in 
determining  the  actual  dual  frame. 


Ill 


Theorem  6.2.3  Let  the  finite  set  {tfk}k=i  b e  an  exact  frame  in  some  Hilbert  space.  Then 
the  dual  frame  {i>k}%=i  is  given  by 


n 

'Pk  ~  }  ]  Ckj'tpj 

3  =  1 


where 


[A  —  ck,j 


and  the  matrix  A  is  given  by 


[A]kJ  = 


Proof.  The  dual  frame,  always  being  in  the  span  of  the  frame  elements,  can  be 
expressed  by 

n 

'Pk  —  y  Chj'Pj  • 
i=i 

Since  this  frame  is  exact,  we  have  by  Theorem  4.1.3  that 

( ipki'Pm )  —  bkrn  . 


That  is 


N 


ck,j{'lftj  ?  'Pm)  —  $1 

j- 1 


k 


This  gives 


I  =  CA, 

where  [C]k,j  =  which  implies  that  C  =  A -1.  That  is,  [A_1]fcj  =  ckj.  □ 

In  order  to  use  Theorem  6.2.3,  it  will  be  necessary  to  determine  the  inner  product  of 
windowed  frame  elements.  This  is  easily  done  (as  will  be  seen)  in  the  Z2(^+)  space.  For 


112 


this  reason  (among  others),  it  is  necessary  to  find  the  representation  in  the  i2(^2)  space 
for  the  simple  pole  functions  used  in  the  representations  in  H2(V>).  The  following  theorem 
gives  that  representation. 


Lemma  6.2.4  For  a  pole  of  the  form 


ip(z) 


(i  - 

(1  —  az) 


where  \a\  <  1 ,  T  1/if  is  given  by 


T-'m 


^(i  - 

(1  +  a) 


(71) 


where  T  is  as  defined  in  (63). 

Proof.  Since  T_1  =  we  will  first  look  at  5-1^  ,  as  defined  by  (60). 


S  V($)  = 


tM 

V^(l  +  s) 

(i-H2)1/2 

(iiUMll 

y/^{l  +  s) 

(1-jal2)1/2 
^((l  +  s)  -  a(l  -  5)) 

(1  -  M2)1/2 

^((1  -  a)  +  5(1  +  a)) 

(i-H2)1/2 

A(!  +  “)(5  +  (iri)) 


Applying  the  inverse  Laplace  transform  operator,  defined  in  (62),  to  this  result,  we 
get 


T"V(i)  =  C-'S-'ipit) 


1  r  (1-  M2)1/2 
i ^  ^  A(1  +  «)(«  +  (l^f  ) ) 

v^i-H2)1/2 

(1+2) 


est  ds 


□ 


113 


6.2.3  Inner  products  with  windowed  frame  elements.  In  practice,  we  will  be 
required  to  find  the  inner  product  with  the  windowed  frame  elements.  Note  that  for  real- 
valued  windowing  functions,  W  £  L2(R),  we  get  (f,Wg)  =  (W f,g)  for  all  /,  g  £  L2(R). 
For  our  purposes,  one  implication  of  this  is  that  the  window  may  be  applied  to  the  data 
rather  than  to  the  frame  element.  For  our  frame  elements  {ifn,k},  we  have  (g,ifn,k)  = 
(1  -  |anijfc|2)ly,2fif(anij(.)  for  g  £  H2(D);  that  is,  the  inner  product  with  one  of  our  frame 
elements  samples  the  the  value  of  the  function  at  the  sample  point  a„:k  ■  By  Theorem  6.2.2, 
(59),  and  (63),  we  have  exact  values  for  TW f  £  H2(D)  at  certain  sample  points,  which 
makes  this  theorem  very  useful  provided  the  points  {anjS:}  include  those  sample  points. 


6.2.f  Representation  of  frame  elements,  ifnBn.  Much  of  the  theoretical  work 
dealt  with  frame  elements  xl>nBn  of  the  form 


and 


(i-KI2)1/2 

(1  -ajz) 


Bn{z) 


TT  (o-k  —  z) 

L\  (i-^r 


(72) 


(73) 


As  shown  in  Chapter  V,  nBn  E  span{^}^=1.  For  practical  (i.e.,  computer)  applications, 
it  is  desirable  to  use  the  expression  i>nBn  =  Ylk=ick'lPk-  The  following  lemma  gives  the 
values  of  the  constants  Ck  in  that  representation. 


Lemma  6.2.5  For  and  Bn  as  in  (72)  and  (73),  with  an  /  ak  for  all  n  ^  k,  we  have 


n 

IpnBn  =  bki>k 

k  —  1 


where 


b 


n 


TT  (1  ~  anaj ) 

j  =  l  ( aj  ~  an) 


and  where  for  k  =  1, . .  .  ,n  —  1, 

h  =  (l-Kn^q-Kl2)1/2  fr  (i  -  OkOj)  _ 
(l-CfcOn)  (aj-  ak) 


114 


Proof.  This  proof  is  strictly  algebraic  in  nature.  Since  an  ^  a*  for  all  rc  /  &,  we 
have  that  4>„Bn  can  be  expanded  according  to 


i>nBn 


(i  -  ki2)1'1  y,1  (a  - £)  =  f  t  d-Ki2)1'2 

(!-<•»*)  (!-<•(*) 


Multiplying  both  sides  by  the  quantity  (1  —  a„z),  we  have 


n  —  1 


n- 1 


'1  -  Kl2)1/2  n  TVZ  —  \  =  6»(1  “  l«n|2)1/2  +  (l  -an^^bjtpj 

j- 1  V1  a.?,2v 


i=i 


Evaluating  this  expression  at  z  =  this  gives 

„  =  w  <-»  -  tfp) 

/i  (!-%*) 

_  TT  (1  ~  Qj^n) 
j=i  (ai  —  a^) 


The  same  technique  can  be  used  to  show  the  value  of  6*  for  k  =  1, . .  .,n  —  1. 


□ 


6.2.5  Choice  of  points  As  mentioned  previously,  the  set  of  points  {an)fc}, 

JV 

used  in  (54)  and  (55)  to  define  our  frame,  should  include  the  set  {xn  +  iAuk}*=_K  for  each 
chosen  xn,  under  the  mapping  z  By  Lemma  6.1.4,  we  know  that  for  each  xn,  these 

points  are  evenly  spaced  under  the  pseudo-hyperbolic  metric,  p.  Therefore,  it  is  necessary 
to  add  to  this  set  only  enough  points  so  that  the  separation  requirements  of  Theorem  3.1.4 
are  met. 

6.3  The  computer  program 

The  design  of  a  computer  program  based  on  the  frame  developed  in  Section  6.1  is 
discussed.  Designing  this  application  program  involved  making  additional  assumptions 
and  simplifications.  The  main  points  of  interest  pertaining  to  the  points  of  correspondence 
between  theory  and  application  are  described  below.  Brief  summaries  of  some  of  the 
heuristics  employed  are  given  also. 

The  main  simplification  of  concern  is  the  choice  of  basis  functions.  The  sets  chosen 
by  this  application  are  a  subset  of  those  required  by  Theorem  3.6.4.  This  is  not  necessarily 


115 


unreasonable  for  this  application,  however,  since  only  a  small  number  of  basis  elements 
are  to  be  retained  in  any  case.  The  choice  of  basis  elements  is  discussed  below,  followed 
by  discussions  of  other  heuristics  and  approximations.  Taken  as  a  whole,  the  remainder  of 
this  section  presents  a  very  rough  sketch  of  the  design  of  the  program. 

6.3.1  Basis  set  selection.  As  seen  in  Section  6.1,  for  each  offset  time  used  in  the 
creation  of  the  frame,  a  frame  for  X2(®0  is  required.  For  this  application,  it  was  decided  to 
use  the  same  frame  for  each  offset  time,  regardless  of  the  underlying  data  to  be  represented. 
This  frame  is  of  the  form  given  in  Theorem  5.2.10. 

As  shown  in  Lemma  3.6.3,  for  our  choice  of  basis  functions,  we  know  that 

if,1pn,k)  =  (1  -  K,fc|)1/2/(«n,*)-  That  is,  the  basis  function  samples  the  function  at  the 
sample  point  an>*.  Theorem  3.1.4  gives  the  constraints  under  which  the  sample  points  are 
chosen. 

Section  6.1.5  gives  curves  Sn  from  which  the  sample  points  are  to  be  chosen.  However, 
to  find  the  sampled  values  at  arbitrary  points  of  these  curves  is  computationally  expensive. 
So,  computationally  tractable  subsets  of  these  curves,  based  on  the  calculations  shown 
in  Theorem  6.2.2,  are  used  instead.  The  computationally  tractable  subsets  used  by  the 
program  consist  of  points  along  a  horizontal  segment  in  the  complex  plane,  for  which  the 
sample  values  can  be  easily  computed  (given  Theorem  6.2.2).  These  points  are  mapped  via 
the  mapping  z  i— *  onto  the  right-half  of  curves  of  the  form  Sn  described  in  Section  6.1.5. 
Depending  on  the  settings  of  the  program,  the  left-half  of  the  curve  can  either  be  ignored 
or  filled-in  with  points  generated  by  other  horizontal  segments. 

6.3.2  Determination  of  offset  times  and  analysis  window  sizes.  There  are  two 

methods  by  which  this  program  can  determine  the  offset  times  and  sizes  of  the  analysis 
windows;  data-dependent  and  data-independent.  In  either  case,  the  maximum  analysis 
window  size  is  specified. 

In  the  data-independent  method,  the  analysis  windows  are  evenly  spaced  in  time. 
The  desired  number  of  analysis  windows  to  overlap  each  data  point,  n  is  used  to  determine 
the  analysis  window  size,  with  the  size  being  determined  as  the  largest  multiple  of  n  not 
greater  than  the  maximum  analysis  window  size.  One  window  starts  at  the  first  sample 
data  point,  which  determines  the  offset  times  of  the  remaining  windows.  Enough  analysis 
windows  are  generated  so  that  each  sample  data  point  is  included  in  n  analysis  windows. 

In  the  data-dependent  method,  the  program  uses  a  heuristic  algorithm  (described  in 
Appendix  C)  to  estimate  where  glottal  pulses  begin  and  places  analysis  windows  starting 


116 


at  those  locations.  An  additional  window  is  added  which  starts  at  the  first  data  point. 
Following  this,  additional  windows  (if  necessary)  are  placed  to  ensure  that  every  data  point 
is  contained  in  at  least  n  analysis  windows.  Finally,  the  lengths  of  the  analysis  windows 
are  determined  such  that  every  data  point  is  contained  in  exactly  n  analysis  windows. 
This  results  in  analysis  windows  which  are  of  varying  lengths  and  are  unevenly  spaced. 
Potentially,  many  of  these  windows  are  of  lengths  much  less  than  the  maximum  analysis 
window  length,  depending  on  the  vocal  pitch,  n,  and  the  maximum  analysis  window  size. 

6.3.3  A  parameter  for  internal  rescaling.  In  the  representation  given  in  Theo¬ 
rem  3.6.4,  a  basis  is  given  which  requires  sets  of  sample  points  in  the  complex  unit  disk, 
D.  In  the  simplest  case  these  sample  points  are  elements  of  torus-shaped,  closed  subsets  of 
D  arranged  about  the  origin.  With  the  transform  given  in  (59),  the  point  s  =  1  is  mapped 
to  the  origin.  That  is,  the  point  at  the  origin  corresponds  to  the  decaying  exponential  e“*. 
Sample  points  taken  from  closed,  torus-shaped  subsets  of  D  about  the  origin  correspond 
to  decaying  exponentials  of  both  higher  and  lower  decay  rates. 

Note  that  in  the  above,  the  units  of  t  are  not  given.  If  the  time  units  are  assumed  to 
be  seconds,  one  sees  a  much  slower  decay  rate  than  that  observed  in  the  impulse  response 
of  the  vocal  tract.  That  is,  the  decay  rate  seen  in  a  frame  element,  assuming  a  time  unit  of, 
e.g.,  seconds,  is  not  necessarily  a  good  match  for  speech.  Because  of  this,  it  is  desirable  to 
rescale  the  units  so  that  an  exponential  of  a  decay  rate  more  representative  of  that  seen  in 
the  impulse  response  of  the  vocal  tract  corresponded  to  the  point  at  the  origin.  To  do  this, 
an  additional  variable,  spread ,  has  been  added  to  the  program.  This  variable  corresponds 
to  an  internal  (to  the  program)  multiplication  of  the  sampling  rate. 

To  summarize,  spread  can  be  considered  as  a  “knob”  to  be  turned  as  far  as  the 
operation  of  the  program  is  concerned.  Tweaking  it  is  encouraged. 

6.3.4  Basis  selection.  For  this  application  we  desire  a  compact  approximation. 
That  is,  for  a  given  example  of  speech,  a  reasonably  close  approximation  is  desired  which 
is  built  from  a  reasonably  small  number  of  basis  elements.  This  requires  some  selection 
criteria  for  the  basis  elements. 

The  method  used  in  the  application  program  is  to  choose  a  fixed  (user  input)  number 
of  basis  elements  for  each  of  the  offset  times.  The  selection  of  basis  elements  for  each  time 
increment  is  independent  of  that  for  the  other  time  increments. 

For  each  offset  time,  the  basis  elements  to  be  used  are  chosen  via  a  heuristic  very 
similar  to  the  orthogonal  matching  pursuit  described  in  [24].  In  the  heuristic  used  here,  the 


117 


basis  elements  are  chosen  iteratively.  After  each  selection,  the  projection  onto  the  space 
spanned  by  the  basis  elements  chosen  thus  far  (for  that  offset  time  only)  is  found.  The 
residual  found  by  subtracting  the  projection  from  the  original  is  used  in  the  selection  of 
the  next  basis  element.  This  selection  is  based  on  the  magnitudes  of  the  inner  products 
of  the  candidate  basis  elements  with  the  signal  to  be  represented,  weighted  by  a  factor 
designed  to  favor  basis  elements  which  are  further  (in  the  pseudo-hyperbolic  metric)  from 
those  already  chosen.  (There  is  reason  to  believe  that  this  selection  criteria  may  yield  a 
perceptually  better  projection.) 

6.3.5  Approximation  coefficients.  Once  the  basis  elements  for  the  approximation 
are  chosen  for  each  time  increment,  it  is  necessary  to  find  the  coefficient  associated  with 
each  one  to  represent  the  projection  of  the  signal  onto  the  span  of  the  chosen  basis  elements. 
This  representation  is  found  using  the  iterative  approximation  given  in  Theorem  4.3.4. 

The  projections  onto  the  basis  elements  chosen  for  each  offset  time  is  done  via  matrix 
inversions.  These  results  are  then  used  in  the  iterative  algorithm,  when  the  number  of 
iterations  used  is  a  user  input. 

Since  the  inner  product  between  non-overlapping  basis  elements  is  zero,  iterating 
with  non-overlapping  analysis  windows  has  no  effect,  in  theory,  on  the  inverse  frame  com¬ 
putation.  However,  such  iteration  is  sometimes  of  use  in  this  implementation,  to  improve 
numeric  accuracy.  In  certain  cases,  the  matrix  inversions  are  unstable  due  to  the  poles  be¬ 
ing  chosen  too  close  together,  resulting  in  a  near-singular  matrix.  In  this  case,  the  inverse 
matrix  is  approximated  by  the  inverse  matrix  for  the  non-windowed  basis  functions  since 
that  inverse  is  known  exactly.  This  approximate  inverse  is  usually  close  enough  to  the  true 
inverse  so  that  iterations  of  the  inverse  frame  approximation  yield  a  good  result. 

6.4  Summary 

Presented  in  this  chapter  was  an  application  of  the  results  of  Chapters  III,  IV,  and  V. 
A  frame  tailored  to  speech  representation  in  L2(R)  was  constructed  using  frames  for  H2(D). 
The  necessary  constraints  and  bounds  were  proven  to  enable  the  use  of  the  iterative  method 
given  in  Theorem  4.3.4.  The  design  of  a  computer  program  which  finds  representations  in 
such  a  frame  is  discussed. 

Of  particular  note,  the  frame  elements  of  this  speech  frame  have  the  useful  property 
of  sampling  the  value  of  the  Discrete  Fourier  Transform  of  sampled  speech  signal,  allowing 
for  fast  calculation  of  inner  products  with  frame  elements.  This  property  is  important  in 


118 


that  digitally  recorded  speech  is  already  sampled,  with  good  interpolation  between  sample 
points  being  problematic. 

The  approach  taken  here  could  be  used  for  other  types  of  functions;  it  is  not  re¬ 
stricted  to  speech.  It  should  prove  to  be  useful  in  other  areas  of  signal  analysis  and  in 
the  representation  of  other  types  of  functions,  particularly  where  the  characteristics  of  the 
function  vary  with  the  independent  variable. 


119 


VII.  Computer  experiments 


In  this  chapter,  the  computer  experiments  will  be  presented  and  discussed.  Three 
different  types  of  analyses  were  done  -  fine-scale,  medium-scale,  and  large-scale. 

In  the  fine-scale  analysis,  an  in-depth  look  is  taken  at  the  performance  of  the  pro¬ 
jection  algorithm  for  a  single  offset  time.  This  analysis  is  done  for  short  segments  of  data 
(64  data  points  or  fewer),  both  for  segments  of  speech  and  segments  of  non-speech  signals 
of  similar  frequency  content.  In  addition,  the  performance  of  the  algorithm  on  harmonic 
speech  is  examined  with  respect  to  the  location  of  the  start  of  the  glottal  pulse  within  the 
analysis  window. 

In  the  medium-scale  analysis,  longer  data  segments  are  used.  For  tests  involving 
actual  speech,  contiguous,  harmonic  speech  is  used  from  a  male  and  a  female  speaker.  A 
row  of  data  from  a  digitized  image  and  a  reversed  segment  of  harmonic  speech  are  used 
as  non-speech  samples  for  this  medium-scale  analysis  only.  These  samples  are  used  to  test 
the  reconstruction  algorithm  of  Theorem  4.3.4  as  well  as  to  examine  some  of  the  properties 
of  the  representation. 

In  the  large-scale  analysis,  the  algorithm  is  tested  only  on  two  entire  sentences,  one 
from  a  male  speaker  and  one  from  a  female  speaker.  The  data  compression  possible  by 
using  this  representation  is  examined.  Additional  analysis  is  done  using  the  same  two 
sentences  to  which  Gaussian  white  noise  has  been  added,  to  examine  the  noise  suppression 
characteristics  inherent  in  the  approximation  algorithm. 

The  recorded  speech  used  in  these  tests  is  from  the  TIMIT  database.  This  ex¬ 
tensive  corpus,  a  joint  effort  by  several  sites  sponsored  by  the  Defense  Advanced  Research 
Projects  Agency  -  Information  Science  and  Technology  Office  (DARPA-ISTO),  is  currently 
maintained  by  the  National  Institute  of  Standards  and  Technology  (NIST).  The  TIMIT 
database  contains  recordings  of  6300  sentences  read  by  speakers  from  all  major  dialect 
regions  of  this  country.  It  is  indexed  by  a  unique  speaker  identification  code  and  sentence 
identification  code  for  each  entry. 

The  Arpabet  notation  was  used  notationally  to  identify  the  phonemes  used  in  analy¬ 
ses  on  shorter  signal  lengths.  A  description  of  the  Arpabet  notation  can  be  found  in,  e.g., 
[23]. 

Entropic’s  Signal  Processing  System  (ESPS),  written  by  the  Entropic  Research  Lab¬ 
oratory,  Inc.,  was  used  to  display  and  manipulate  much  of  the  data  used  in  these  analyses. 


120 


All  analysis  on  actual  speech  used  two  recordings  from  the  TIMIT  database.  Where 
shorter  segments  are  needed  for  testing,  they  are  extracted  from  these  two  sentences.  The 
female  voice  was  speaker  fcmmO  saying  sentence  sal  (“She  had  your  dark  suit  in  greasy 
wash  water  all  year.”).  The  male  voice  was  speaker  mcmjO  saying  sentence  sxl9\  (“They 
enjoy  it  when  I  audition.”). 

The  TIMIT  database  recordings  are  digitized  at  a  sample  rate  of  16  kHz.  Since  most 
of  the  important  information  in  human  speech  is  contained  in  the  frequency  range  0-4  kHz, 
the  speech  was  down-sampled  using  the  function  sfconvert  of  ESPS  to  a  sampling  rate  of 
8  kHz  for  the  analysis. 

In  every  case  reported,  normalized  L2  error  is  the  L2  error  normalized  by  the  L2 
norm  of  the  signal.  When  working  with  signals  to  which  noise  has  been  added,  the  L2 
error  is  normalized  by  the  L2  norm  of  the  clean  signal. 

7.1  Fine-scale  analyses 

The  purpose  of  the  fine-scale  analyses  is  to  examine  the  performance  of  representation 
based  on  a  set  of  zero  offset  time  exponentially  damped  sinusoids  (simple  pole  functions), 
such  as  given  in  (71).  This  relates  to  the  work  presented  in  Chapter  III. 

The  performance  of  the  projection  algorithm  was  examined  in  two  ways.  The  first 
analysis  compares  the  performance  of  the  representation  algorithm  on  different  signals, 
some  speech  and  some  not  speech.  The  second  analysis  examines  the  performance  of  the 
projection  algorithm  for  different  alignments  of  one  or  two  glottal  pulses  within  an  analysis 
window. 

7.1.1  Description  of  fine-scale  analyses. 

7. 1.1.1  Performance  on  different  types  of  signal.  The  analysis  was  done 
with  eight  signals.  The  eight  signals  are  described  below  and  are  shown  in  Figure  5. 

Three  of  these  signals,  the  phonemes  /IY /,  /OY/,and  /S /  are  actual  speech  segments 
consisting  of  one  glottal  pulse  period  of  harmonic  speech  for  /IY/  (female  speaker)  and 
/OY/  (male  speaker)  and  a  segment  of  similar  length  for  /Sf  (female  speaker).  The 
segments  /IY/  and  / OY/  are  zero-padded  so  that  each  segment  would  have  64  total  data 
points. 

Of  the  five  non-speech  signals,  four  of  them  ( blocks ,  bumps ,  doppler ,  and  heavisine ) 
were  inspired  by  those  in  [10].  These  test  functions  have  various  characteristics,  such  as 


121 


structure,  and  frequency  content,  that  seemed  to  be  a  good  test  of  the  method,  so  they 
were  duplicated  for  use  here.  The  fifth,  bumpstx ,  is  the  inverse  Fourier  transform  of  bumps . 
Each  signal  contains  64  data  points. 

Blocks  is  a  sequence  of  step  functions  of  varying  widths  and  heights.  Of  the  four,  it 
appears  (to  visual  inspection)  the  least  like  speech.  Bumps  resembles  the  magnitude  Fourier 
spectrum  of  a  signal  containing  several  well-defined  frequency  components.  It  also  bears 
little  resemblance  to  speech,  although  it  lacks  the  discontinuities  found  in  blocks.  Doppler 
is  a  windowed,  chirped,  sinusoid.  Heavisine  is  is  a  sinusoid  added  to  a  step  function  to  form 
discontinuities.  Bumpstx ,  being  the  inverse  Fourier  transform  of  bumps  is  not  speech,  but 
has  spectral  qualities  that  should  be  well  represented  by  the  representation  being  tested. 

To  fairly  compare  the  tests  on  the  different  samples,  I  evaluated  the  magnitude  of 
their  Fourier  transforms  and  rescaled  the  signals  so  that  the  spectrum  was  concentrated 
in  a  range  similar  to  that  of  speech.  I  then  sampled  the  signals  at  the  same  rate  as  that 
for  the  speech.  I  used  a  data  window  of  size  similar  to  that  of  one  glottal  pulse.  In  this 
way,  I  hoped  to  make  the  comparisons  on  a  more  even  foundation.  The  magnitude  Fourier 
transforms  of  the  rescaled,  windowed,  sampled,  signals  are  shown  in  Figure  6. 

Each  signal  was  approximated  using  a  single  analysis  window.  That  is,  the  approx¬ 
imation  is  a  sum  of  exponentially  damped  sinusoids  starting  at  time  /  =  0.  The  decay 
rate  on  the  window  was  zero.  Thirty-two  poles  were  used  for  each  approximation.  For 
each  signal,  we  compare  the  original  signal  and  its  approximation  for  increasing  numbers 
of  pole.  These  results  are  shown  in  Figures  7  through  22.  The  X2  norm  errors  between  the 
original  signals  and  the  approximations  are  plotted  in  Figure  23. 

7 .1.1.2  Performance  for  different  alignments.  This  analysis  examines  the 
performance  of  the  projection  algorithm  for  different  positions  of  the  first  glottal  pulse 
within  an  analysis  window;  it  is  intended  to  show  the  effects  of  mis-alignment  of  the 
glottal  pulse  within  the  analysis  window. 

A  single  glottal  pulse  was  repeated  to  create  the  analysis  data.  A  periodic  shift  of 
the  sampled  data  was  used  to  simulate  different  positions  of  the  first  glottal  pulse  within 
a  window.  Glottal  pulses  were  used  from  the  phonemes  /IY/  and  /OY/. 

Analysis  windows  of  two  (data  dependent)  lengths  were  examined.  In  the  first  case, 
the  analysis  window  contains  one  glottal  pulse;  in  the  second,  two.  This  resulted  in  window 
lengths  of  34  and  68  samples,  respectively,  for  the  phoneme  /IY/  and  window  lengths  of 
59  and  118,  respectively,  for  the  phoneme  /OY/. 


122 


The  program  was  used  to  find  the  approximation  for  each  offset  (shift  of  the  data) 
and  for  numbers  of  poles  from  one  through  20.  Table  1  shows  the  minimum  and  maximum 
normalized  L2  errors  for  the  approximation  with  each  number  of  poles  for  each  phoneme 
for  an  analysis  window  containing  one  and  two  glottal  pulses.  The  errors  for  each  offset 
for  approximations  with  one,  two,  three,  six,  nine,  and  18  poles  are  shown  in  Figures  24 
through  27.  Figures  24  and  25  show  the  results  for  an  analysis  window  containing  one 
glottal  pulse  for  phonemes  /IY/  and  /OY /,  respectively.  Figures  26  and  27  show  the  same 
for  an  analysis  window  containing  two  glottal  pulses. 

7.1.2  Discussion  of  results. 

7. 1.2.1  Performance  on  different  types  of  signal.  Examining  Figures  7 
through  10,  it  can  be  seen  that  for  harmonic  speech  (the  phonemes  / IY/  and  /OY/),  the 
approximation  appears  (a  subjective  judgement)  to  become  close  to  the  original  signal 
with  few  poles,  both  in  the  time  and  frequency  domains.  This  is  both  expected  and  hoped 
for,  since  one  pulse  cycle  of  harmonic  speech  resembles  our  building  blocks  of  damped 
sinusoids.  The  phoneme  /S/  (Figures  11  and  12)  appears  to  be  less  well  represented,  both 
in  the  time  and  frequency  domains.  This  is  also  expected,  since  this  phoneme  strongly 
resembles  Gaussian  white  noise,  which  is  not  modeled  well  by  such  building  blocks. 

Figure  13  shows  clearly  some  of  the  weak  points  of  this  representation  as  pertains 
to  representation  of  non-speech  signals,  and  hence,  supporting  my  hypothesis  that  the 
representation  is  better  suited  to  speech  than  most  non-speech.  Blocks  contains  sharp 
discontinuities  and  constant  portions.  While  the  approximation  may  appear  close  in  an  Z2 
norm  sense,  it  has  lost  many  of  the  characteristics  that  one  might  wish  to  preserve,  such  as 
the  constant  portions  of  the  signal  and  discontinuity  locations.  However,  Figure  14  shows 
that  the  approximation  in  the  frequency  domain  looks  much  better  -  subjectively  as  good 
as  for  the  phonemes  /IY/  and  /OY/  (Figures  8  and  10). 

The  results  for  bumps  are  even  worse  (Figures  15  and  16).  While  the  approximation 
has  large  wiggles  in  the  appropriate  locations,  it  takes  many  more  poles  to  get  a  reasonable 
approximation,  either  in  the  time  or  the  frequency  domain. 

The  results  for  bumpstx  are  somewhat  better,  mainly  due  to  the  way  that  it  was 
constructed  -  as  the  inverse  Fourier  transform  of  a  function  with  well  isolated  spikes.  As 
can  be  seen  in  Figure  17,  the  approximation  does  become  visually  close  in  the  time  domain 
rather  quickly.  In  Figure  18,  it  can  be  seen  that  successive  poles  align  with  different  spikes 
in  the  transform,  behavior  that  is  not  always  obvious  with  some  of  the  other  signals. 


123 


(a)  /IY/ 


(b)  /OY/ 


sec 


1/250  1/125 


1/250  1/125 


(c)  IS/ 


(e)  Bumps 


1/250  1/125 


(f)  Bumpstx 


1/250  1/125 


(g)  Doppler 


(h)  Heavisine 


Figure  5.  Data  used  for  fine-scale  analysis,  (a)  The  phoneme  /IY/,  female  speaker,  (b) 
the  phoneme  /OY/,  male  speaker,  (c)  the  phoneme  /S/,  female  speaker,  (d) 
blocks ,  (e)  bumps ,  (f)  bumpstx ,  (g)  doppler ,  and  (h)  heavisine . 


124 


(a)  F[/IY/] 


(b)  F[/OY /] 


(c)  F[/S/] 


(e)  F[Bumps] 


(d)  F[Blocks] 


(f)  F[Bumpstx] 


(h)  F[Heavisine] 


Figure  6.  Fourier  transform  of  the  data  used  for  fine-scale  analysis,  (a)  The  phoneme 
/ IY/,  female  speaker,  (b)  the  phoneme  / OY/,  male  speaker,  (c)  the  phoneme 
/S/,  female  speaker,  (d)  blocks ,  (e)  bumps ,  (f)  bumpstx ,  (g)  doppler ,  and  (h) 
heavisine . 


125 


Doppler  shows  more  of  the  weaknesses  of  using  this  representation  with  arbitrary 
L2  signals.  While  doppler  is  fairly  smooth  and  has  a  sinusoidal  appearance,  its  signifi¬ 
cant  frequency  contribution  from  wide  range  of  frequencies  makes  it  unsuitable  for  this 
representation.  As  seen  in  Figures  19  and  20,  32  poles  are  not  sufficient  to  get  a  good 
approximation  in  either  the  frequency  domain  or  the  time  domain. 

The  last  signal,  heavisine ,  seen  in  Figures  21  and  22,  along  with  the  earlier  results 
for  blocks ,  shows  the  inability  of  this  representation  to  represent  well  (within  one  window) 
a  discontinuity.  The  approximation  achieved  with  six  poles  closely  matches  the  general 
shape  of  the  original  signal,  but  shows  no  hint  of  the  discontinuity.  The  representation 
appears  good  in  the  frequency  domain,  but  enough  high-frequency  data  is  lost  so  that  the 
discontinuities  are  smoothed. 

As  seen  in  Figure  23,  for  each  signal,  the  L2  error  decreases  with  each  additional  pole 
added.  This  is  to  be  expected,  since  the  approximations  are  found  by  projecting  the  signal 
onto  the  space  spanned  by  the  poles.  Therefore,  the  error  can  not  increase  with  the  addition 
of  each  successive  pole.  Note  that  for  the  phonemes  / IY/  and  /OY /  (Figure  23  (a)-(b)), 
the  error  with  one  pole  is  significantly  lower  than  all  others  except  heavisine .  Contrasting 
with  this  result  is  the  result  for  the  phoneme  /S/  (Figure  23  (c)),  for  which  the  L2  error 
with  six  poles  exceeds  that  of  the  phonemes  /IY/  and  /OY/  with  one  pole.  In  fact,  the 
Lo  error  plot  for  /S/  is  consistently  higher  than  that  of  all  of  the  other  signals.  This  is 
not  unexpected,  given  the  noise-like  nature  of  the  phoneme  /S/.  For  the  other  signals, 
which  have  features  more  suitable  for  approximation  using  this  representation  (i.e.,  more 
speech-like),  the  error  plots  range  between  those  for  /IY/,  /OY/,  and  heavisine  and  that 
of  /S/. 


7 .1.2,2  Performance  for  different  alignments.  As  seen  in  Figures  24  and  26 
and  in  Table  1,  for  an  analysis  window  containing  a  single  glottal  pulse,  the  alignment  of 
the  glottal  pulse  within  the  analysis  window  makes  a  major  impact  on  the  closeness  of 
the  approximation  for  a  fixed  number  of  poles.  Using  the  results  for  the  phoneme  /IY/ 
as  an  example,  the  values  in  Table  1  shows  that  to  achieve  the  same  error  found  when 
approximating  with  three  poles  in  the  best  alignment  would  require  16  poles  in  the  worst 
alignment. 

The  results  for  the  phoneme  /OY/  are  even  more  impressive;  from  Table  1  it  can  be 
seen  that  a  closer  ^-approximation  can  be  found  with  two  poles,  optimally  aligned,  than 
can  be  found  with  20  poles  in  the  worst  alignment. 


126 


Further  examination  of  Figures  24  and  25  shows  that  there  is  some  flexibility  as  to 
where  the  window  can  begin  and  still  achieve  an  approximation  with  error  similar  to  that 
of  the  best  alignment.  For  the  phoneme  /OY/  (Figure  25),  this  good  alignment  region  is 
quite  distinct,  with  a  rapid  rise  in  error  for  alignments  outside  this  region. 

When  the  analysis  window  contains  two  glottal  pulses,  the  results  are  less  dramatic. 
As  seen  in  Table  1,  for  both  phoneme  /IY /  and  /OY/,  the  error  achieved  with  a  three  poles 
approximation  with  the  best  alignment  can  be  matched  by  an  eight  poles  approximation 
with  the  worst  alignment.  However,  the  same  region  of  good  alignment  can  be  seen  in 
Figure  26  and  27  as  before,  although  it  is  less  distinct. 

While  these  results  may  appear  to  be  significant,  it  is  worth  noting  that  Z2-error  is, 
in  general,  not  a  valid  measure  of  perceptual  closeness  for  speech.  While  I  believe  that 
the  L2  norm  is  more  valid  given  the  basis  set  (exponentially  damped  sinusoids)  of  the 
approximation,  an  exhaustive,  perceptual  experiment  has  not  been  conducted  to  confirm 
this. 


127 


1/250 


1/125 


1/250 


1/125 


(c)  4  poles 

(d)  8  poles 

800 

A 

800 

A 

600 

M 

600 

i/L 

400 

400 

200 

j  \ 

200 

\ 

0 

0 

\  J  \ 

-200 

Y  f 

-200 

-400 

-400 

\  / 

-600 

\^l 

-600 

*  V 

1/250  1/125  1/250  1/125 


(e)  16  poles  (f)  32  poles 


1/250  1/125  1/250  1/125 


Figure  7.  Approximation  sequence  for  phoneme  /IY /.  (a)  Approximation  with  one  pole, 
(b)  approximation  with  two  poles,  (c)  approximation  with  three  poles,  (d) 
approximation  with  four  poles,  (e)  approximation  with  five  poles,  and  (f)  ap¬ 
proximation  with  six  poles.  In  each  plot,  the  approximation  is  shown  (solid 
line)  superimposed  over  the  original  data  (dashed  line). 


128 


(a)  1  pole 


(b)  2  poles 


(c)  4  poles 


(d)  8  poles 


(e)  16  poles 


(f)  32  poles 


Figure  8.  Fourier  transform  of  the  approximation  sequence  for  phoneme  /IY/.  Fourier 
transforms  of:  (a)  approximation  with  one  pole,  (b)  approximation  with  two 
poles,  (c)  approximation  with  three  poles,  (d)  approximation  with  four  poles, 
(e)  approximation  with  five  poles,  and  (f)  approximation  with  six  poles.  In 
each  plot,  the  Fourier  transform  of  the  approximation  is  shown  (solid  line) 
superimposed  over  the  Fourier  transform  of  the  original  data  (dashed  line). 


129 


(a)  1  pole 


(b)  2  poles 


Figure  10.  Fourier  transform  of  the  approximation  sequence  for  phoneme  /OY/.  Fourier 
transform  of  approximation  with  (a)  one  pole,  (b)  two  poles,  (c)  four  poles, 
(d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each  plot,  the  Fourier 
transform  of  the  approximation  is  shown  (solid  line)  superimposed  over  the 
Fourier  transform  of  the  original  data  (dashed  line). 


131 


(a)  1  pole 


(b)  2  poles 


Figure  11.  Approximation  sequence  for  phoneme  /S/.  Approximation  with  (a)  one  pole, 
(b)  two  poles,  (c)  four  poles,  (d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles. 
In  each  plot,  the  approximation  is  shown  (solid  line)  superimposed  over  the 
original  data  (dashed  line). 


132 


(a)  1  pole 


(b)  2  poles 


Figure  12.  Fourier  transform  of  the  approximation  sequence  for  phoneme  /S/.  Fourier 
transform  of  approximation  with  (a)  one  pole,  (b)  two  poles,  (c)  four  poles, 
(d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each  plot,  the  Fourier 
transform  of  the  approximation  is  shown  (solid  line)  superimposed  over  the 
Fourier  transform  of  the  original  data  (dashed  line). 


0 - J 


(c)  4  poles 


(d)  8  poles 


(e)  16  poles 


(f)  32  poles 


Figure  13.  Approximation  sequence  for  blocks .  Approximation  with  (a)  one  pole,  (b)  two 
poles,  (c)  four  poles,  (d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each 
plot,  the  approximation  is  shown  (solid  line)  superimposed  over  the  original 
data  (dashed  line). 


Figure  14.  Fourier  transform  of  the  approximation  sequence  for  blocks.  Fourier  transform 
of  approximation  with  (a)  one  pole,  (b)  two  poles,  (c)  four  poles,  (d)  eight 
poles,  (e)  16  poles,  and  (f)  32  poles.  In  each  plot,  the  Fourier  transform  of  the 
approximation  is  shown  (solid  line)  superimposed  over  the  Fourier  transform 
of  the  original  data  (dashed  line). 


(a)  1  pole 


(b)  2  poles 


50 

1 

40 

1 

II 

1 

II 

30 

1 

II 

II 

II 

11 

l  1 

20 

11 

1  1 

1  1 

1  l 

1  1 

1  1 

10 

1  l 

1  \ 

1  l 

I  1 

1/250  1/125 

(c)  4  poles 


(e)  16  poles 


(d)  8  poles 


(f)  32  poles 


Figure  15.  Approximation  sequence  for  bumps.  Approximation  with  (a)  one  pole,  (b)  two 
poles,  (c)  four  poles,  (d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each 
plot,  the  approximation  is  shown  (solid  line)  superimposed  over  the  original 
data  (dashed  line). 


136 


Figure  16.  Fourier  transform  of  the  approximation  sequence  for  bumps.  Fourier  transform 
of  approximation  with  (a)  one  pole,  (b)  two  poles,  (c)  four  poles,  (d)  eight 
poles,  (e)  16  poles,  and  (f)  32  poles.  In  each  plot,  the  Fourier  transform  of  the 
approximation  is  shown  (solid  line)  superimposed  over  the  Fourier  transform 
of  the  original  data  (dashed  line). 


400 


200 

0 

-200 


Figure  17.  Approximation  sequence  for  bumpstx.  Approximation  with  (a)  one  pole,  (b) 
two  poles,  (c)  four  poles,  (d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each 
plot,  the  approximation  is  shown  (solid  line)  superimposed  over  the  original 
data  (dashed  line). 


2000 


4000 


2000 


4000 


Figure  18.  Fourier  transform  of  the  approximation  sequence  for  bumpstx .  Fourier  trans¬ 
form  of  approximation  with  (a)  one  pole,  (b)  two  poles,  (c)  four  poles,  (d) 
eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each  plot,  the  Fourier  trans¬ 
form  of  the  approximation  is  shown  (solid  line)  superimposed  over  the  Fourier 
transform  of  the  original  data  (dashed  line). 


139 


1/250 


1/125 


Figure  19. 


Approximation  sequence  for  a 
two  poles,  (c)  four  poles,  (d)  ei; 
plot,  the  approximation  is  she 
data  (dashed  line). 


(b)  2  poles 


loppler .  Approximation  with  (a)  one  pole,  (b) 
ght  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each 
iwn  (solid  line)  superimposed  over  the  original 


140 


Figure  21.  Approximation  sequence  for  heavisine.  Approximation  with  (a)  one  pole,  (b) 
two  poles,  (c)  four  poles,  (d)  eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each 
plot,  the  approximation  is  shown  (solid  line)  superimposed  over  the  original 
data  (dashed  line). 


(a)  1  pole 


(b)  2  poles 


Figure  22.  Fourier  transform  of  the  approximation  sequence  for  heavisine .  Fourier  trans¬ 
form  of  approximation  with  (a)  one  pole,  (b)  two  poles,  (c)  four  poles,  (d) 
eight  poles,  (e)  16  poles,  and  (f)  32  poles.  In  each  plot,  the  Fourier  trans¬ 
form  of  the  approximation  is  shown  (solid  line)  superimposed  over  the  Fourier 
transform  of  the  original  data  (dashed  line). 


143 


Normalized  error 


(a)  IlY  / 


Normalized  error 


(b)  /OY/ 


Normalized  error 

o.;i 
0.6 
0.4 
0.2 
0 


(c)  /S/ 


Normalized  error 


10  15  20  25  30 


#  of  poles 


(d)  Blocks 


t  of  poles 


Normalized  error 


(e)  Bumps 


#  of  poles 


Normalized  error 


(f)  Bumpstx 


#  of  poles 


Normalized  error 


(g)  Doppler 


#  of  poles 


Normalized  error 


(h)  Heavisine 


^  of  poles 


Figure  23.  Number  of  poles  vs.  normalized  L2  error  for  each  signal.  The  error  is  normal¬ 
ized  by  the  L2  norm  of  the  original  signal,  (a)  The  phoneme  /IY/,  (b)  the 
phoneme  /OY/,  (c)  the  phoneme  /S/,  (d)  blocks ,  (e)  bumps ,  (f)  bumpstx ,  (g) 
doppler ,  and  (h)  heavisine. 


144 


S 

t— ! 

ft 

ft 

CO 

00 

p- 

CM 

tP 

TP 

ft 

00 

00 

tP 

CO 

00 

ft 

t-H 

O 

P- 

ft 

P 

p 

CM 

CM 

t-H 

ft 

ft 

o 

TP 

TP 

CM 

rH 

CO 

TP 

00 

ft 

ft 

rH 

CM 

TP 

b- 

P 

O 

O 

ft 

ft 

CM 

b- 

b- 

t- 

b- 

O 

b- 

rH 

o 

p- 

ft 

t-H 

ft 

P- 

ft 

ft 

ft 

X 

P 

P 

b- 

Tp 

Y— 1 

T— H 

ft 

ft 

00 

CO 

ft 

ft 

b- 

TP 

p- 

o 

ft 

T— 1 

CM 

CM 

ft 

ft 

£ 

ft 

W 

CM 

Tp 

ft 

o 

ft 

o 

t- 

tP 

o 

ft 

r- 

ft 

Tp 

CO 

CM 

CM 

rH 

O 

05 

00 

o 

o 

00 

b- 

ft 

ft 

ft 

ft 

HP 

TP 

Tp 

CO 

co 

CO 

CO 

CO 

CO 

CO 

CO 

CO 

CM 

CM 

-a 

p 

CD 

<5 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

o 

o 

o 

d 

% 

6 

CD 

ft 

P 

CO 

P 

ft 

ft 

CO 

ft 

00 

CO 

o 

o 

ft 

ft 

ft 

ft 

ft 

b- 

o 

p- 

a> 

00 

o 

T—i 

CO 

O 

p 

ft 

p- 

CO 

ft 

ft 

ft 

ft 

ft 

CO 

CO 

o 

o 

rH 

ft 

ft 

ft 

CM 

ft 

ft 

P- 

rP 

6 

o 

CD 

ft 

CO 

tp 

rH 

ft 

o 

ft 

b- 

CO 

00 

o 

ft 

b- 

ft 

ft 

O 

CO 

05 

P^ 

*lt 

Ph 

•p 

Pi 

Tp 

i—i 

tp 

tp 

b- 

00 

ft 

CO 

o 

00 

TP 

rH 

o 

00 

ft 

00 

CM 

Tp 

tP 

P- 

X 

ct 

w 

o 

ft 

ft 

rH 

b- 

HP 

CM 

o 

ft 

ft 

TP 

CO 

CM 

o 

ft 

00 

00 

b- 

ft 

ft 

p 

ct 

Sh 

a? 

r> . 

p- 

ft 

tp 

tp 

CO 

CO 

CO 

CO 

CM 

CM 

CM 

CM 

CM 

CM 

T— H 

rH 

T— 1 

T— f 

rH 

rH 

S 

o 

o 

o 

o 

d 

o 

o 

d 

O 

d 

o 

d 

d 

d 

o 

d 

o 

d 

d 

d 

CO 

CD 

CO 

6 

CM 

CM 

CM 

CO 

rH 

CO 

^p 

TP 

ft 

CO 

ft 

Tp 

ft 

o 

Tp 

ft 

Tp 

CM 

co 

ft 

p 

P 

p 

ID 

00 

T - 1 

ft 

ft 

ft 

o 

o 

ft 

00 

ft 

Y - 1 

00 

ft 

ft 

o 

ft 

G5 

Tp 

co 

CD 

P 

o 

rH 

CM 

CM 

ft 

o 

t-H 

ft 

CM 

rH 

CO 

ft 

Tp 

ft 

co 

p- 

ft 

00 

r- i 

05 

x 

p 

ft 

CM 

CO 

TP 

ft 

CO 

IP- 

00 

o 

CO 

TP 

TP 

ft 

o 

CO 

ft 

p- 

rH 

CM 

p- 

P 

p5 

00 

ft 

CM 

GO 

ft 

co 

00 

ft 

ft 

TP 

CM 

Y—i 

o 

o 

ft 

00 

p- 

b- 

ft 

ft 

»  t—i 

b- 

tP 

Tp 

CO 

CO 

CO 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

t— 1 

rH 

rH 

i-H 

rH 

rH 

4-3 

o 

a) 

o 

o 

d 

d 

o 

d 

o 

d 

d 

o 

d 

d 

o 

d 

o 

o 

o 

o 

d 

d 

bJO 

S 

CD 

p 

1 

p 

r-H 

ft 

b- 

CM 

oo 

ft 

rH 

ft 

ft 

CO 

o 

CO 

ft 

00 

rH 

00 

ft 

ft 

p- 

O 

p 

rH 

CM 

ft 

GO 

00 

o 

HP 

t- 

ft 

b- 

00 

ft 

ft 

Y - 1 

00 

CM 

rH 

ft 

p- 

t- 

H 

rP 

p 

o 

ft 

co 

b- 

b- 

TP 

ft 

I>- 

CO 

CM 

b- 

ft 

o 

ft 

TP 

TP 

ft 

p- 

ft 

CM 

CO 

Ph 

p 

ft 

ft 

Tp 

o 

ft 

Tp 

o 

CM 

CM 

CO 

1 — i 

ft 

TP 

CO 

ft 

CM 

ft 

rH 

ft 

CO 

w 

Tp 

00 

CM 

GO 

CO 

rH 

o 

b- 

ft 

ft 

Tp 

CO 

CM 

CM 

rH 

rH 

o 

o 

05 

05 

ro 

ft 

CO 

CO 

CM 

CM 

CM 

CM 

T — ! 

rH 

rH 

rH 

rH 

rH 

rH 

rH 

rH 

rH 

rH 

O 

o 

o 

d 

o 

O 

d 

d 

o 

O 

d 

d 

o 

d 

d 

o 

o 

d 

O 

o 

o 

d 

6 

fH 

ft 

00 

ft 

IP 

t-H 

00 

^p 

00 

T— 1 

ft 

ft 

p- 

i— t 

TP 

ft 

CM 

O 

CM 

05 

ft 

H 

p 

ft 

ft 

b- 

00 

ft 

ft 

CO 

ft 

t-H 

p- 

b- 

ft 

o 

00 

TP 

TP 

TP 

CM 

o 

ft 

.5 

o 

00 

ft 

ft 

o 

ft 

IP- 

o 

CM 

CO 

ft 

TP 

ft 

b- 

00 

o 

ft 

ft 

ft 

ft 

p 

GO 

ft 

Tp 

CM 

o 

00 

■p 

TP 

CO 

o 

ft 

rH 

ft 

ft 

TP 

ft 

rH 

rH 

rH 

05 

£ 

<D 

X 

CM 

t— 1 

CM 

ft 

ft 

ft 

o 

00 

ft 

TP 

CM 

o 

ft 

00 

b- 

b- 

ft 

ft 

CO 

0 

ct 

k— -i 

00 

h; 

ft 

ft 

Tp 

Tp 

"p 

TP 

CO 

CO 

CO 

CO 

CO 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

nj 

P 

CD 

§ 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

£ 

s 

CD 

p 

CO 

P 

p 

tp 

id 

ft 

CO 

ft 

CO 

o 

TP 

TP 

TP 

00 

ft 

ft 

CM 

ft 

T— 1 

00 

b- 

05 

05 

o 

p 

p 

ft 

CO 

ft 

b- 

ft 

ft 

ft 

TP 

CM 

tP 

co 

o 

ft 

ft 

T— i 

ft 

p- 

co 

ft 

TP 

CO 

>> 

§ 

o 

00 

tp 

00 

ft 

ft 

b- 

ft 

ft 

Tp 

Tp 

ft 

ft 

CM 

rH 

CO 

T— 1 

o 

o 

ft 

ft 

Pd 

p 

p 

CO 

ft 

o 

CM 

ft 

CM 

ft 

b- 

00 

ft 

ft 

CO 

b- 

CM 

ft 

CM 

b- 

Tp 

o 

05 

It 

*S 

w 

rH 

CM 

1 — ( 

ft 

b- 

ft 

tp 

co 

CM 

rH 

o 

o 

ft 

ft 

00 

00 

b* 

p- 

ft 

p 

UpH 

ft 

CM 

CM 

t-H 

rH 

t-H 

rH 

t-H 

rH 

rH 

rH 

t-H 

O 

O 

o 

o 

o 

o 

o 

o 

ct 

*H 

CD 

s 

i 

o 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

Ph 

<D 

CO 

s 

I— ( 

o 

ft 

ft 

CO 

ft 

ft 

00 

TP 

co 

CO 

Y—i 

rH 

CM 

ft 

ft 

CO 

co 

tH 

co 

b- 

p 

p 

P 

ft 

Y— 1 

CM 

CM 

CM 

CO 

TP 

rH 

ft 

CM 

CM 

CO 

Tp 

ft 

TP 

Tp 

CM 

o 

rH 

CM 

CD 

\ 

s 

O  ; 

00 

ft 

CO 

rH 

CO 

CM 

CM 

CO 

rH 

ft 

O 

«- ■  1 

ft 

o 

ft 

TP 

00 

rH 

CM 

ft 

i — i 

p 

CM 

r-H 

ft 

ft 

ft 

t- 

CO 

ft 

tp 

P- 

00 

CM 

CO 

ft 

ft 

CM 

Tp 

CO 

ft 

ft 

ct 

1— H 

X 

w 

T— 1 

CM 

ft 

00 

Tp 

CM 

ft 

b- 

ft 

CO 

rH 

O 

ft 

00 

ft 

ft 

ft 

TP 

CO 

CM 

4-3 

+3 

ct 

S— 4 

00 

ft 

ft 

Tp 

Tp 

CO 

co 

CO 

CO 

CO 

CO 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

CM 

O 

W) 

CD 

CD 

c 

§ 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

P 

CD 

ej 

P 

P 

P 

rH 

00 

b- 

TP 

ft 

p^ 

ft 

ft 

CO 

b- 

o 

tP 

ft 

co 

ft 

CM 

ft 

CP) 

ft 

05 

o 

O 

P 

p 

ft 

CM 

ft 

CO 

CO 

ft 

ft 

ft 

b- 

ft 

CO 

CM 

CM 

CM 

ft 

CO 

ft 

00 

CM 

05 

r» | 

P 

o 

CM 

o 

ft 

ft 

Tp 

00 

ft 

b- 

o 

00 

CM 

O 

CM 

ft 

ft 

o 

t— H 

CM 

O 

ft 

Ph 

P 

p 

O 

CO 

GO 

P- 

CO 

t— i 

p- 

ft 

ft 

CM 

c 

P- 

ft 

CM 

00 

t-H 

o 

ft 

CM 

T—I 

*P 

p3 

ft 

CM 

ft 

o 

p- 

ft 

ft 

CM 

t-H 

rH 

rH 

o 

ft 

ft 

00 

00 

t- 

b- 

b- 

b- 

TP 

CO 

CM 

CM 

rH 

t-H 

rH 

T— i 

t-H 

rH 

rH 

t-H 

O 

O 

o 

o 

o 

q 

o 

q 

S 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

d 

D-h 

CO 

O 

=#: 

CD 

O 

ft 

rH 

CM 

CO 

tp 

ft 

ft 

00 

ft 

10 

t-H 

CM 

Y—i 

13 

TP 

rH 

ft 

rH 

16 

17 

oo 

rH 

19 

o 

CM 

ct 

£ 

ft 

f-H 

O 


>- 

CD 

c 

ct 


CO 

CD 


CD 

£ 

O 


fr- 

,o 


£t= 

o 

£ 

o 

td 

s 


co 

It 

P 

ct 

4-3 

P 

CD 

s- 

£ 


H 

CD 

> 

O 


S- 
O 

t  ^ 


X 

ct 

6 

*d 

§ 


a 

ct 

CD 

p 

o 

bO 

g 

P 

3 

p 

o 

u 


5  & 

6  o 

:§  i 

s  £ 


145 


Normalized  error  (a)  }  pole 


Offset 

Normalized  error  (a)  3  poles 


Normalized  error  (a)  2  poles 


Normalized  error  (a)6po]es 


Offset 


Offset 


Normalized  error  (a)  9  pdes 


0.8 

0.6 

0.4 

0.2 


5  10  15  20  25  30 


unNosM 

Blllinr 

Offset 


Normalized  error(a)  lg  poles 


Figure  24.  Data  offset  vs.  normalized  Z2  error  for  the  phoneme  /IY/  for  an  analysis  win¬ 
dow  containing  one  pulse.  The  original  data  is  shown  over  the  histogram  for 
comparative  purposes.  The  offset  used  to  generate  a  given  value  in  the  his¬ 
togram  is  the  abscissa  point  of  the  original  data  directly  above  the  histogram 
value.  The  error  is  normalized  by  the  L2  norm  of  the  original  signal.  Approx¬ 
imation  using  (a)  one  pole,  (b)  two  poles,  (c)  three  poles,  (d)  six  poles,  (e) 
nine  poles,  and  (f)  18  poles. 


146 


Normalized  error  (b)  j  pole 


10  20  30  40  50 


Offset 

Normalized  error  (b)  3  po,es 


Normalized  error  (b)  9  poles 


A  a. 

P 

HI  1 

m 

10  20  30  40  50 


Offset 


Normalized  error  (b)  2  poles 


Normalized  error  (b)6poles 


A  /v _ 

1 

IB 

W 

H 

1 

n 

10  20  30  40  50 


Offset 

Normalized  error(b)  lg  poles 


Figure  25.  Data  offset  vs.  normalized  L2  error  for  the  phoneme  /OY/  for  an  analysis 
window  containing  one  pulse.  The  original  data  is  shown  over  the  histogram 
for  comparative  purposes.  The  offset  used  to  generate  a  given  value  in  the 
histogram  is  the  abscissa  point  of  the  original  data  directly  above  the  his¬ 
togram  value.  The  error  is  normalized  by  the  L2  norm  of  the  original  signal. 
Approximation  using  (a)  one  pole,  (b)  two  poles,  (c)  three  poles,  (d)  six  poles, 
(e)  nine  poles,  and  (f)  18  poles. 


147 


Normalized  error  (a)  j  p0,e 


0.8 

0.6 

0.4 

0.2 

0 


Normalized  error  (a)  3  poles 


10  20  30  40  50  60 


Offset 


Normalized  error  (a)  2  poles 


Offset 

Normalized  error  (a)  6  poles 


Offset 


Normalized  error 


(a)  9  poles 


10  20  30  40  50  60 


Offset 


Offset 


Figure  26.  Data  offset  vs.  normalized  L2  error  for  the  phoneme  /IY/  for  an  analysis  win¬ 
dow  containing  two  pulses.  The  original  data  is  shown  over  the  histogram  for 
comparative  purposes.  The  offset  used  to  generate  a  given  value  in  the  his¬ 
togram  is  the  abscissa  point  of  the  original  data  directly  above  the  histogram 
value.  The  error  is  normalized  by  the  L2  norm  of  the  original  signal.  Approx¬ 
imation  using  (a)  one  pole,  (b)  two  poles,  (c)  three  poles,  (d)  six  poles,  (e) 
nine  poles,  and  (f)  18  poles. 


148 


Normalized  error  (b)  l  pole 


0.8 

0.6 

0.4 

0.2 

0 


Normalized  error  (b)  3  poles 


0.8 

0.6 

0.4 

0.2 

0 


Normalized  error  (b)  9  poles 


20  40  60  80  100 
Offset 


0.8 

0.6 

0.4 

0.2 

0 


20  40  60  80  100 


Offset 


Normalized  error  (b)  2  poles 


Offset 

Normalized  error  (b)  poles 


1 

r 

_ A _ i 

f  A-ry 

V t 

V 

v/.u 
0.6 
0.4 
0.2 

°  20  40  60  80  100 

Offset 

Normalized  error(b)  lg  poles 


Offset 


Figure  27.  Data  offset  vs.  normalized  i2  error  for  the  phoneme  /OY/  for  an  analysis 
window  containing  two  pulses.  The  original  data  is  shown  over  the  histogram 
for  comparative  purposes.  The  offset  used  to  generate  a  given  value  in  the 
histogram  is  the  abscissa  point  of  the  original  data  directly  above  the  his¬ 
togram  value.  The  error  is  normalized  by  the  Z2  norm  of  the  original  signal. 
Approximation  using  (a)  one  pole,  (b)  two  poles,  (c)  three  poles,  (d)  six  poles, 
(e)  nine  poles,  and  (f)  18  poles. 


149 


7.2  Medium-scale  analyses 

The  fine-scale  analyses  above  examined  the  performance  of  the  representation  on  the 
scale  of  a  single  analysis  window.  The  goal  of  the  medium-scale  analysis  is  to  examine  the 
performance  of  the  the  frame  described  in  Chapter  VI,  based  on  Theorem  4.2.4,  and  the 
algorithm  to  find  the  frame  representation  given  in  Theorem  4.3.4. 

The  performance  of  this  representation  is  examined  in  two  ways.  First,  representation 
of  harmonic  speech  is  analyzed.  Next,  representation  of  non-speech  is  analyzed.  The  error 
for  differing  number  of  basis  function  and  overlaps  between  basis  function  is  analyzed. 

7.2.1  Description  of  medium  scale  analyses. 

7 .2.1.1  Performance  on  harmonic  speech.  Two  segments  of  speech,  the 
phoneme  /IY /  from  a  female  speaker  and  the  phoneme  /OY/  from  a  male  speaker,  each 
consisting  of  several  glottal  pulses,  were  used  in  this  analysis.  The  program’s  pulse  finder 
was  used  to  determine  the  estimated  starting  times  of  the  glottal  pulses  in  the  segments. 
Both  segments  and  the  pulse  start  times  as  identified  by  the  program  are  shown  in  Fig¬ 
ure  28.  The  spectrograms  for  each  segment  are  shown  in  Figure  29. 

Approximations  were  found  for  analysis  windows  overlapping  by  0%,  50%,  67%,  and 
75%.  Note  that  this  corresponds  to  each  data  point  being  contained  in  one,  two,  three, 
and  four  analysis  windows,  respectively.  For  each  of  these  cases,  the  analysis  window  decay 
rate  was  varied  through  0,  25,  50,  75,  and  100.  Approximations  using  three,  six,  10,  and 
32  poles  poles  per  glottal  pulse  were  found  for  each  case. 

The  data-dependent  analysis  window  selection  method  was  used.  That  is,  the  starting 
points  of  the  glottal  pulses  were  used  as  the  starting  points  of  the  analysis  windows,  with 
an  additional  analysis  window  starting  at  the  first  sample  point  was  added  to  the  the  set. 
To  prevent  additional  analysis  windows  from  being  inserted  between  glottal  pulses  for  the 
cases  of  67%  and  75%  overlapping  windows,  the  maximum  length  of  the  analysis  window 
was  set  to  256  sample  data  points.  This  was  done  for  the  sake  of  fair  comparisons  between 
the  cases  of  differing  overlap  amounts. 

The  same  analysis  window  start  points  are  used  for  each  of  the  different  analysis 
window  overlap  amounts;  the  window  length  is  simply  adjusted  to  give  desired  overlap 
amount.  Additional  windows  are  added  starting  before  the  signal  start  time  as  needed 
to  achieve  the  desired  overlaps  at  the  start  of  the  segment.  All  analysis  window  selection 
was  done  automatically  by  heuristic  algorithms  in  the  program,  as  described  in  Section  6.3 


150 


and  Appendix  C.  The  analysis  window  selection  is  independent  of  the  window  decay  rate, 
the  number  of  poles  to  be  chosen,  and  the  number  of  iterations  to  be  performed  of  the 
reconstruction  approximation. 

The  segment  containing  the  phoneme  /IY /  consists  of  343  sample  points  and  was 
determined  to  contain  nine  glottal  pulses,  yielding  a  basic  set  of  10  analysis  windows.  The 
segment  containing  the  phoneme  /OY/  consists  of  512  sample  points  and  was  determined 
to  contain  eight  glottal  pulses,  for  a  basic  set  of  nine  analysis  windows.  For  the  cases  of 
analysis  window  overlaps  of  50%,  67%  and  75%,  one,  two,  and  three  additional  analysis 
windows  were  added,  respectively,  to  give  the  desired  amount  of  overlap.  Each  additional 
analysis  window  begins  at  a  time  before  the  first  sampled  data  point.  The  candidate  poles 
were  the  same  for  every  case,  and  were  chosen  automatically  as  described  in  Section  6.3. 

The  program  was  run  for  all  combinations  of  the  cases.  The  poles  chosen  in  each 
case  are  shown  graphically  in  Figures  31  through  38. 

The  program  was  set  to  use  11  iterations  (numbered  0  through  10)  of  the  approxima¬ 
tion  given  in  Theorem  4.3.4  to  determine  the  approximation  of  the  speech  segment.  The 
error  after  each  iteration  of  the  approximation  process  was  found.  The  error  for  each  case 
is  shown  graphically  in  Figures  39  through  46. 

7 .2.1.2  Performance  on  non-speech  signals.  In  this  analysis,  the  perfor¬ 
mance  on  two  non-speech  signals  is  examined.  The  performance  is  compared  to  that  with 
the  phoneme  / OY/,  which  was  the  same  segment  used  previously.  A  single  row  of  a  digi¬ 
tized  image,  Lenna ,  was  used  as  the  first  non-speech  signal.  The  second  is  reversed  speech, 
that  is,  sampled  speech  in  reverse  order.  For  this  sample,  the  same  data  points  as  are 
in  the  control  sample  (the  phoneme  /OY/)  are  used,  in  reverse  order.  Since  non-speech 
does  not  necessarily  have  a  glottal  pulse  structure  that  could  be  profitably  used  by  the 
program,  evenly  spaced  analysis  windows  of  length  128  sample  points  with  50%  overlap 
between  segments  are  used  in  this  analysis.  Due  to  some  unexpected  results  to  be  de¬ 
scribed  below,  the  program  was  also  run  for  the  case  of  analysis  windows  aligned  with  the 
glottal  pulses,  with  50%  window  overlap  and  an  analysis  window  maximum  length  of  128 
samples  to  give  additional  data  for  comparison.  The  segments  used  by  the  program  are 
shown  in  Figure  30.  The  analysis  window  decay  rate  was  varied  through  0,  25,  50,  75,  and 
100.  Approximations  for  three,  six,  10,  and  32  poles  were  found  for  each  case.  All  data 
segments  contained  512  data  points  and  nine  analysis  windows  were  used. 


151 


Phoneme  /IY/ 


20  55  90  124  159  193  227  261  295 
Phoneme /OY/ 


54  114  173  232  290  349  409  469 


Figure  28.  Data  segments  used  in  medium-scale  analysis.  The  estimated  starting  points 
of  the  glottal  pulses,  as  determined  by  the  program,  are  shown  by  the  vertical 
grid  lines.  Additional  vertical  grid  lines  were  placed  to  clearly  delineate  the 
starting  points  and  ending  points  of  the  segments.  Note  that  the  heuristic 
pulse-finding  algorithm  (described  in  Appendix  C)  may  not  perform  well  near 
the  beginnings  and  ends  of  segments,  as  can  be  seen  near  the  end  of  the 
segment  of  the  phoneme  /IY/,  due  to  insufficient  data. 

The  program  was  run  for  all  combinations  of  the  cases.  The  program  was  set  to 
use  11  iterations  (numbered  0  through  10)  of  the  approximation  given  in  Theorem  4.3.4 
to  determine  the  approximation  of  the  speech  segment.  The  error  after  each  iteration  of 
the  approximation  process  was  found.  The  error  for  each  case  is  shown  graphically  in 
Figures  47  through  50.  The  minimum  errors  for  each  signal  for  each  number  of  poles  in 
the  approximation  are  compared  in  Figure  51. 

7.2.2  Discussion  of  results.  The  results  for  these  analyses  were  mixed,  as  will  be 
described  below.  On  the  one  hand,  the  results  for  harmonic  speech  were  not  unexpected. 
However,  the  results  for  the  representations  of  non-speech  were  unexpected,  with  one  non- 


152 


4000 

2000 


0 

4000 

2000 

0 

0. 

Figure  29.  Spectrograms  of  harmonic  speech  segments  used  in  medium-scale  analysis,  (a) 
Phoneme  /IY/.  (b)  Phoneme  /OY/. 

speech  signal  being  better  represented  than  the  comparison  speech  signal.  A  reasonable 
explanation  for  this  occurrence  is  proposed. 

7.2,2 .1  Performance  on  harmonic  speech .  Figures  31  through  38  illustrate 
the  behavior  of  the  heuristic  which  chooses  poles.  Comparing  these  figures  to  the  respective 
spectrograms  in  Figure  29,  one  can  see  that  the  poles  are  chosen  along  the  formant  locations 
first,  with  later  poles  filling  in  detail  away  from  the  main  formants.  The  formant  locations 
are  especially  prominent  in  the  plots  for  the  six  pole  and  10  pole  approximations.  In  the 
plots  for  the  32  pole  approximations,  enough  detail  information  has  been  added  that  a 
plot  of  poles,  without  magnitude  information,  cannot  reveal  the  formant  locations  well. 
Examining  the  plots  for  the  three  pole  approximations,  it  appears  that  there  may  be  some 
benefit  to  the  more  rapidly  decaying  analysis  windows  in  that  the  first  two  formants  are 
more  quickly  filled-in. 


(a)  Phoneme  /IY/ 


0.02 


(b)  Phoneme  /OY/ 


153 


(a)  Lenna 


1  65  129  193  257  321  385  449 

(b)  Phoneme  /OY/  reversed 


1  65  129  193  257  321  385  449 

Figure  30.  Non-speech  data  segments  used  in  medium-scale  analysis.  The  starting  points 
of  the  analysis  windows  are  shown  by  the  vertical  grid  lines.  Additional  ver¬ 
tical  grid  lines  were  placed  to  clearly  delineate  the  starting  points  and  ending 
points  of  the  segments,  (a)  One  row  of  the  digitized  image,  Lenna .  (b)  The 
phoneme  /OY/,  reversed. 

Figures  39  and  46  show  both  the  behavior  of  the  inverse  frame  operator  algorithm  de¬ 
scribed  in  Theorem  4.3.4  and  the  circumstances  under  which  having  overlapping  analysis 
windows  gives  a  definite  L2  norm  advantage.  As  can  be  seen  in  this  figure,  the  itera¬ 
tive  algorithm  converges  reasonably  quickly,  with  only  negligible  improvements  after  five 
iterations. 

Examination  of  Figures  39  through  42  reveals  that,  for  this  phoneme  /IY /,  having 
overlapping  analysis  windows  gives  a  definite  L2  norm  advantage  with  the  three  pole  per 
glottal  pulse  approximations  but  a  negligible  improvement  or  even  a  slight  degradation 
in  the  approximations  with  more  poles  per  glottal  pulse.  Figures  43  through  46  reveal  a 
similar  story,  although  with  this  phoneme  there  is  some  noticeable  advantage  to  having 
overlapping  windows  with  the  6  poles  per  glottal  pulse  approximations. 


154 


7 .2.2.2  Performance  on  non-speech  signals.  The  results  for  representation 
of  non-speech  signals  were  not  as  expected.  Instead  of  superior  performance  on  speech  as 
opposed  to  non-speech,  the  opposite  occurred.  In  proposing  explanations  for  this  unex¬ 
pected  result,  an  interesting  observation  is  made  concerning  longer  analysis  windows. 

Comparing  Figures  47  and  48,  it  can  be  seen  that  while  the  reversed  speech  was 
slightly  less  well  represented  with  six  and  10  pole  approximations  than  the  speech  in  the 
correct  order,  for  the  32  pole  approximations  the  difference  was  negligible,  and  for  the 
three  pole  approximations,  the  reversed  speech  was  actually  represented  better.  That  the 
reversed  speech  was  represented  so  well  was  unexpected. 

An  even  more  unexpected  result  is  found  in  comparisons  between  Figures  47  and  49. 
Here,  the  non-speech  Lenna  is  much  better  represented  for  approximations  of  all  tested 
numbers  of  poles.  This  unexpected  result  required  further  investigation.  Figure  50  shows 
the  error  of  approximations  of  the  phoneme  /OY/  using  analysis  windows  aligned  with 
glottal  pulses.  This  figure  shows  data  that  is  suggestive  of  much  improved  representation 
with  glottal  pulse  alignment,  but  not  conclusive,  since  to  align  the  segments  with  the 
glottal  pulses  required  one  more  analysis  window,  and  hence,  more  basis  elements  were 
used  for  the  approximations.  Comparing  this  figure  with  Figures  47,  48,  and  49,  it  can  be 
seen  that  the  aligned  analysis  windows  gave  a  better  representation  of  the  phoneme  / OY / 
than  the  non-aligned  windows,  but  that  Lenna  was  still  better  represented.  A  comparison 
of  the  best  approximations  of  each  of  the  four  signals  is  shown  in  Figure  51. 

Because  these  results  were  so  unexpected,  the  data  were  examined  in  more  detail. 
Seen  in  Figure  52  are  the  data  in  the  analysis  windows  used  in  finding  the  approximations 
to  Lenna  and  /OY/.  Here,  it  can  be  seen  that  Lenna  appears  far  less  regular  than  /OY/. 
However,  as  seen  in  Figure  53,  the  Fourier  transform  of  the  segments  of  Lenna  is  much 
simpler  than  that  of  the  segments  of  /OY/.  The  implications  of  this  is  that  the  segments 
of  Lenna  may  be  more  easily  represented  in  our  basis  (in  an  L2  norm  sense),  despite  their 
irregular  appearance. 

Also  revealed  in  Figure  53  are  some  short-comings  of  using  longer  analysis  windows. 
As  can  be  seen  in  the  transforms  of  the  segments  of  /OY/,  the  segments  containing  more 
data  points  from  two  glottal  pulses  (segments  1  through  7)  have  a  spectrum  with  more 
detail  (i.e.,  the  more  jagged  appearance  is  due  to  the  presence  of  multiple  pulses  in  the 
analysis  window).  Segment  8,  only  having  data  from  one  glottal  pulse,  does  not  have  this 
level  of  detail.  Since  the  additional  detail  requires  more  poles  to  represent  it  better,  longer 
analysis  windows  may  be  poorly  represented  by  the  same  number  of  poles. 


155 


0%  overlap,  exp(-Ot)  window  50%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-Ot)  window  75%  overlap,  exp(-Ot)  window 


Figure  31.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /IY /  for  approximations  with  three  poles  per  glottal  pulse.  The 
imaginary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  win¬ 
dow  starting  location.  Only  the  poles  determined  for  analysis  windows  with 
starting  points  within  the  speech  segment  are  shown. 


156 


50 %  overlap,  exp{-0t)  window 


75%  overlap,  exp{-0t)  window 


0%  overlap,  exp(-0t)  window 


1  295  1  295 


0%  overlap,  exp(-50t)  window  50%  overlap,  cxp(-50i)  window 


1  295  1  295 

0%  overlap,  cxp(-lOOt)  window  50%  overlap,  cxp(-100t)  window 


1  295  1  295 


67%  overlap,  exp(-25t)  window  75%  overlap,  exp(-25t)  window 


67%  overlap,  cxp(-50t)  window  75%  overlap,  exp(-50t)  window 


1  295  1  295 

67%  overlap,  cxp(-100t)  window  75%  overlap,  exp(-100t)  window 


4000 

4000 

3000 

3000 

2000 

2000 

1000 

1000 

0 

0 

1  295  1  295 


Figure  32.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /IY/  for  approximations  with  six  poles  per  glottal  pulse.  The  imag¬ 
inary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  window 
starting  location.  Only  the  poles  determined  for  analysis  windows  with  start¬ 
ing  points  within  the  speech  segment  are  shown. 


157 


0%  overlap,  exp{-0t)  window 


50%  overlap,  exp(-Ot)  window 


67%  overlap,  exp(-0t)  window 


75%  overlap,  exp(-Ot)  window 


l  295 

0%  overlap,  exp(-25t)  window 


1  295 

0%  overlap,  exp(-50t)  window 


1  295 


1  295 

50%  overlap,  exp(*25i)  window 


4000 

3000 

2000 

1000 

n 

i  j  ;  |  i  !  s  f  i  | 

1  295 

50%  overlap,  exp(-50i)  window 


1  295 


50%  overlap,  exp(-75t)  window 


1  295 

50%  overlap,  exp(-100t)  window 


1  295 


67%  overlap,  exp(-25t)  window 


1  295 

67%  overlap,  cxp(-50t)  window 


1  295 


67%  overlap,  exp(-75l)  window 


4000 

3000 

2000 

1000 

n 

I  !  !  '  1  I  I  i  !  1 

1  295 

67%  overlap,  exp(-100t)  window 


1  295 


1  295 

75%  overlap,  exp(-25t)  window 


1  295 

75%  overlap,  exp(-50t)  window 


4000 

3000 

2000 

1000 

n 

i!  1  !  :  i  !  !  ;  ! 

1  295 

75%  overlap,  exp{-75t)  window 


1  295 

75%  overlap,  exp(-lOOt)  window 


1  295 


Figure  33.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /IY /  for  approximations  with  10  poles  per  glottal  pulse.  The  imag¬ 
inary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  window 
starting  location.  Only  the  poles  determined  for  analysis  windows  with  start¬ 
ing  points  within  the  speech  segment  are  shown. 


158 


0%  overlap,  exp(-0t)  window  50%  overlap,  exp(-Ot)  window  67%  overlap,  exp{-0t)  window  75%  overlap,  exp(-Ot)  window 


Figure  34.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /IY /  for  approximations  with  32  poles  per  glottal  pulse.  The  imag¬ 
inary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  window 
starting  location.  Only  the  poles  determined  for  analysis  windows  with  start¬ 
ing  points  within  the  speech  segment  are  shown. 


159 


Figure  35.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /OY /  for  approximations  with  three  poles  per  glottal  pulse.  The 
imaginary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  win¬ 
dow  starting  location.  Only  the  poles  determined  for  analysis  windows  with 
starting  points  within  the  speech  segment  are  shown. 


160 


0%  overlap,  exp(-0t)  window 


50%  overlap,  exp(-Ot)  window 


67%  overlap,  cxp{-0t)  window 


75%  overlap,  exp(-Ot)  window 


Figure  36.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  / OY/  for  approximations  with  six  poles  per  glottal  pulse.  The 
imaginary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  win¬ 
dow  starting  location.  Only  the  poles  determined  for  analysis  windows  with 
starting  points  within  the  speech  segment  are  shown. 


161 


Figure  37.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /OY/  for  approximations  with  10  poles  per  glottal  pulse.  The  imag¬ 
inary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  window 
starting  location.  Only  the  poles  determined  for  analysis  windows  with  start¬ 
ing  points  within  the  speech  segment  are  shown. 


162 


Figure  38.  Poles  chosen  for  varying  window  overlaps  and  window  decay  rates  for  the 
phoneme  /OY/  for  approximations  with  32  poles  per  glottal  pulse.  The  imag¬ 
inary  portion  of  the  pole  is  plotted  vs.  the  sample  number  of  the  window 
starting  location.  Only  the  poles  determined  for  analysis  windows  with  start¬ 
ing  points  within  the  speech  segment  are  shown. 


163 


50%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-Ot)  window  75%  overlap,  exp(-Ot)  window 


50%  overlap,  exp(-50t)  window  67%  overlap,  exp(-50t)  window  75%  overlap,  exp(-50t)  window 


50%  overlap,  exp(-75t)  window  67%  overlap,  exp(-75t)  window  75%  overlap,  exp(-75t)  window 


50%  overlap,  exp(-lOOt)  window  67%  overlap,  exp(-lOOt)  window  75%  overlap,  exp(-lOOt)  window 


Figure  39.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  /IY /  for  approximations  with  three  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  three  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


164 


50%  overlap,  exp(-Ot)  window 


0.4 

0.3 


50%  overlap,  exp(-25t)  window 


50%  overlap,  exp(-75t)  window 


67%  overlap,  exp(-Ot)  window 


75%  overlap,  exp(-Ot)  window 


75%  overlap,  exp(-50t)  window 


75%  overlap,  exp(-75t)  window 


Figure  40.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  /IY/  for  approximations  with  six  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  six  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


165 


50%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-Ot)  window  75%  overlap,  exp(-Ot)  window 


50%  overlap,  exp(-25t)  window  67%  overlap,  exp(-25t)  window  75%  overlap,  exp(-25t)  window 


50%  overlap,  exp(-50t)  window  67%  overlap,  exp(-50t)  window  75%  overlap,  exp(-50t)  window 


50%  overlap,  exp(-75t)  window  67%  overlap,  exp(-75t)  window  75%  overlap,  exp(-75t)  window 


50%  overlap,  exp(-100t)  window  67%  overlap,  exp(-100t)  window  75%  overlap,  exp(-100t)  window 


Figure  41.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  /IY/  for  approximations  with  10  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  10  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


166 


50%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-Ot)  window  75%  overlap,  exp(-Ot)  window 


50%  overlap,  exp(-75t)  window  67%  overlap,  exp(-75t)  window  75%  overlap,  exp(-75t)  window 


50%  overlap,  exp(-lOOt)  window  67%  overlap,  exp(-lOOt)  window  75%  overlap,  exp(-lOOt)  window 


Figure  42.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  /IY /  for  approximations  with  32  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  32  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


167 


Figure  43.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  /OY/,for  approximations  with  three 
poles  per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over 
all  cases  for  a  three  pole  approximation.  For  each  decay  rate  analyzed  and 
for  window  overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same 
decay  rate  for  non-overlapping  windows  is  shown  by  the  thin  dashed  line 
superimposed  over  the  bar  graph. 


168 


50%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-Ot)  window  75%  overlap,  exp(-Ot)  window 


50%  overlap,  exp(-25t)  window  67%  overlap,  exp(-25t)  window  75%  overlap,  exp(-25t)  window 


50%  overlap,  exp(-50t)  window  67%  overlap,  exp(-50t)  window  75%  overlap,  exp(-50t)  window 


50%  overlap,  exp(-75t)  window  67%  overlap,  exp(-75t)  window  75%  overlap,  exp(-75t)  window 


50%  overlap,  exp(-100t)  window  67%  overlap,  exp(-100t)  window  75%  overlap,  exp(-100t)  window 


Figure  44.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  / OY /  for  approximations  with  six  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  six  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


169 


50%  overlap,  exp(-Ot)  window 


67%  overlap,  exp(-Ot)  window 


75%  overlap,  exp(-Ot)  window 


0.6 

0.5 

0.4 

0.3 


50%  overlap,  exp(-25t)  window 


67%  overlap,  exp(-100t)  window 


Figure  45.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  / OY /  for  approximations  with  10  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  10  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


170 


50%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-Ot)  window  75%  overlap,  exp(-Ot)  window 


50%  overlap,  exp(-25t)  window  67%  overlap,  exp(-25t)  window  75%  overlap,  exp(-25t)  window 


50%  overlap,  exp(-75t)  window  67%  overlap,  exp(-75t)  window  75%  overlap,  exp(-75t)  window 


50%  overlap,  exp(-lOOt)  window  67%  overlap,  exp(-lOOt)  window  75%  overlap,  exp(-lOOt)  window 


Figure  46.  Number  of  iterations  vs.  normalized  error  for  varying  window  overlaps  and 
window  decay  rates  for  the  phoneme  /OY /  for  approximations  with  32  poles 
per  glottal  pulse.  The  thick  dashed  line  marks  the  minimum  error  over  all  cases 
for  a  32  pole  approximation.  For  each  decay  rate  analyzed  and  for  window 
overlaps  of  50%,  67%  and  75%,  the  normalized  error  for  the  same  decay  rate 
for  non-overlapping  windows  is  shown  by  the  thin  dashed  line  superimposed 
over  the  bar  graph. 


171 


3  poles,  exp(-Ot)  window 


0  1  5  10 

Iterations 


6  poles,  exp(-Ot)  window 


Iterations 


10  poles,  exp(-0t)  window 

.8 

I 

°0  1  5  10 

Iterations 


32  poles,  exp(-0t)  window 


0  1  5  10 

Iterations 


3  poles,  exp{-25t)  window 
.8  r 


0  1  5  10 

Iterations 


6  poles,  exp(-25t)  window 

Ur 


10  poles,  exp(-25t)  window 


0  1  5  10 

Iterations 


32  poles,  exp(-25t)  window 

8:fl 


0  1  5  10 

Iterations 


3  poles,  exp(-50t)  window 


6  poles,  exp(-50t)  window 


0  1  5  10 

Iterations 


10  poles,  exp(-50t)  window 

Q'fl 


0  1  5  10 

Iterations 


32  poles,  exp(-50t)  window 


0  1  5  10 

Iterations 


3  poles,  exp(-75t)  window 
8r 


0  1  5  10 

Iterations 


6  poles,  exp(-75l)  window 


10  poles,  exp(-75t)  window 

ft - 


0  1  5  10 

Iterations 


32  poles,  exp{-75t)  window 

ft - 


go: 


0  1  5  10 

Iterations 


3  poles,  exp{-I00t)  window 

0.8p 

Q7r 


6  poles,  exp(-100t)  window 

W 


10  poles,  exp(-lOOt)  window 

‘ft- 


0  1  5  10 

Iterations 


32  poles,  exp(-lOOt)  window 

0.8i - 

Q-7  : 


0  1  5  10 
Iterations 


Figure  47.  Number  of  iterations  vs.  normalized  error  for  a  50%  window  overlap,  evenly 
spaced  analysis  windows,  and  varying  window  decay  rates  and  numbers  of 
poles  for  the  phoneme  /OY/.  The  thick  dashed  line  marks  the  minimum 
error  over  all  cases  of  approximations  with  the  same  number  of  poles. 


172 


L2  error  L2  error  L2  error  L2  error  L2  error 


3  poles,  exp(-Ot)  window 


Iterations 

3  poles,  exp(-25t)  window 


Iterations 

3  poles,  exp(-50t)  window 


Iterations 

3  poles,  cxp(-75t)  window 


Iterations 

3  poles,  exp(-lOOt)  window 


Iterations 


6  poles,  exp(-Ot)  window  10  poles,  cxp{-0t)  window 


Iterations  Iterations 


6  poles,  cxp(-25l)  window  10  poles,  exp(-25t)  window 


Iterations  Iterations 


6  poles,  cxp(-75t)  window  10  poles,  exp(-75t)  window 


Iterations  Iterations 


6  poles,  exp(-lOOl)  window  10  poles,  cxp(-lOOt)  window 


Iterations  Iterations 


32  poles,  exp(-0t)  window 


Iterations 


32  poles,  exp(-25t)  window 


Iterations 


32  poles,  exp(-50t)  window 


Iterations 


32  poles,  exp{-75t)  window 


Iterations 


32  poles,  exp(-lOOt)  window 


Iterations 


Figure  48.  Number  of  iterations  vs.  normalized  error  for  a  50%  window  overlap,  evenly- 
spaced  analysis  windows,  and  varying  window  decay  rates  and  numbers  of 
poles  for  the  phoneme  /OY/,  reversed.  The  thick  dashed  line  marks  the 
minimum  error  over  all  cases  of  approximations  with  the  same  number  of 
poles. 


173 


3  poles,  exp(-Ot)  window 


4 

asp* 

°<)  1 

5  10 

Iterations 

3  poles,  exp(-25t)  window 

7 

fimnnnnr 

5  10 

Iterations 

3  poles,  exp{-50t)  window 

6  poles,  exp(-Ot)  window 


0  1  5  10 

Iterations 


10  poles,  exp{-0t)  window 


Iterations 


6  poles,  exp(-25t)  window 
- 


0  1  5  10 

Iterations 

6  poles,  exp(-50l)  window 


0  1  5  10 

Iterations 


32  poles,  exp(-0t)  window 


0  1  5  10 

Iterations 


10  poles,  exp(-25t)  window 
t8r- 


0  1  5  10 

Iterations 


10  poles,  exp(-50t)  window 

:?[ - 


0  1  5  10 

Iterations 


>1  5  10 

Iterations 

32  poles,  exp(-25t)  window 


0  1  5  10 

Iterations 


32  poles,  exp(-50t)  window 

:?[ 


0  1  5  10 

Iterations 


3  poles,  exp(-75t)  window 

Mr 


6  poles,  exp(-75l)  window 

Kr 

it 


10  poles,  exp(-75t)  window 

.81 

i 

5 


laomnoonm 


0  1  5  10 

Iterations 


32  poles,  exp(-75l)  window 

,8r 

I 


0  1  5  10 

Iterations 


0  1  5 

Iterations 


0  1  5  10 

Iterations 


0  1  5 

Iterations 


32  poles,  exp(-100i)  window 


: 

Jlf 

J 

8i - 

l 

0.5 

0.4 

0'2 

"mnnnnannn 

a  81 

jSttunmni 

§04 

3  8i 

1  $• 
o  0. 

J  q' 

o 

1  11  In!  l!  1  Is  II II 1  li  1 

n 

o:i 

0 

8: 

^  . 

0  1  5  10 

Iterations 


Figure  49.  Number  of  iterations  vs.  normalized  error  for  a  50%  window  overlap,  evenly 
spaced  analysis  windows,  and  varying  window  decay  rates  and  numbers  of 
poles  for  a  row  of  the  digitized  image,  Lenna.  The  thick  dashed  line  marks 
the  minimum  error  over  all  cases  of  approximations  with  the  same  number  of 
poles. 


174 


L2  error  L2  error  L2  error  L2  error  L2  ertor 


3  poles,  exp(-Ot)  window 


Iterations 

3  poles,  exp(-25t)  window 


Iterations 

3  poles,  exp(-50t)  window 


Iterations 

3  poles,  exp(-75t)  window 


Iterations 

3  poles,  exp(-lOOt)  window 


Iterations 


6  poles,  exp(-Ot)  window 


Iterations 

6  poles,  exp(-25t)  window 


Iterations 

6  poles,  exp(-50t)  window 


Iterations 

6  poles,  exp(-75t)  window 


Iterations 

6  poles,  exp(-lOOt)  window 


Iterations 


10  poles,  exp(-Ot)  window 


Iterations 


10  poles,  exp(-25t)  window 


Iterations 


10  poles,  exp(-50t)  window 


Iterations 


10  poles,  exp(-75t)  window 


Iterations 


10  poles,  exp(-lOOt)  window 


Iterations 


32  poles,  exp(-0t)  window 


Iterations 

32  poles,  exp(-25t)  window 


Iterations 

32  poles,  exp(-50t)  window 


Iterations 

32  poles,  exp(-75t)  window 


Iterations 


32  poles,  exp(-100t)  window 


Iterations 


Figure  50.  Number  of  iterations  vs.  normalized  error  for  a  50%  window  overlap  and 
varying  window  decay  rates  and  number  of  poles  for  the  phoneme  /OY/  with 
analysis  windows  aligned  with  glottal  pulses.  The  thick  dashed  line  marks 
the  minimum  error  over  all  cases  of  approximations  with  the  same  number  of 
poles. 


175 


L2  difference 


Number  of  poles  used 


Figure  51.  Normalized  error  for  approximation  of  various  non-speech  and  speech  sam¬ 
ples.  For  each  sample  and  each  number  of  poles  used,  the  minimum  error 
over  all  window  decay  rates  is  plotted.  Data  points  denoted  “o”  mark  the 
normalized  error  for  the  speech  phoneme  /OY /.  Those  denoted  “r”  and  “1” 
mark  the  normalized  error  for  the  reversed  phoneme  / OY/  and  the  row  of 
the  digitized  image  Lenna ,  respectively.  The  data  points  denoted  “a”,  shown 
for  comparison  purposes,  mark  the  normalized  error  of  approximations  of  the 
phoneme  /OY/  with  analysis  windows  aligned  with  the  glottal  pulses. 


176 


W'  V— ^  W  s^/  W>  'w'  V— ^  *  — i 

OOO  O  OOO  O  OOO  O  OOO  O  co 


ooooo  ooooo  ooooo  ooooo 

OOO  O  OOO  O  OOO  O  OOO  o 

OOO  O  OOO  O  OOO  O  OOO  o 

\D^(N  <N  VO^(N  <N  ^D’^-CS  <N  MD  Xf  CN  CM 

■  i  i  i 


177 


Figure  52.  Non-aligned  segments  used  in  finding  the  approximations  of  Lenna  and  /OY/.  Note  that  these  segments  overlap  by 
50%  and  that  the  last  one  is  zero  padded.  The  zeroth  segment,  padded  before  with  zeros,  is  not  shown.  Shown  are 
the  segments  used  for  the  approximations  for  (a)  Lenna  and  (b)  /OY/. 


178 


Figure  53.  Fourier  transforms  of  on-aligned  segments  used  in  finding  the  approximations  of  Lenna  and  / OY/.  Note  that  these 
segments  overlap  by  50%  and  that  the  last  one  is  zero  padded.  The  zeroth  segment,  padded  before  with  zeros,  is  not 
shown.  Shown  are  the  Fourier  transforms  of  segments  used  for  the  approximations  for  (a)  Lenna  and  (b)  /OY/. 


7.3  Large-scale  analysis 

The  fine  and  medium  scale  analyses  examined  aspects  of  the  mathematics  and  heuris¬ 
tics  on  a  small  scale  -  too  small  for  listening  tests.  However,  in  speech  processing,  the  only 
test  for  accuracy  generally  accepted  as  reliable  is  listening  tests.  In  addition,  the  results 
of  the  previous  sections  were  on  a  very  limited  number  of  phonemes.  Thus  there  is  no 
guarantee  that  the  results  extend  to  longer  lengths  of  more  varied  speech.  The  large  scale 
analysis  done  here  is  intended  to  examine  both  of  these  issues. 

7.3.1  Description  of  large  scale  analyses.  The  purpose  of  this  analysis  is  to 
examine  the  performance  of  the  program  on  longer  segments  of  digitized  speech.  Both  clean 
speech  and  noisy  speech  of  varying  signal-to-noise-ratio  (SNR)  are  used  in  this  analysis. 
All  noise  is  additive  Gaussian  white  noise. 

Two  complete  sentences  from  the  TIMIT  database,  sal  spoken  by  female  speaker 
fcmmO  and  sx!9f  spoken  by  male  speaker  mcmjO ,  are  used  in  this  analysis.  To  create  the 
noisy  versions  of  these  sentences,  Gaussian  white  noise  of  an  amplitude  necessary  to  create 
the  desired  SNR  was  added  to  each  sentence.  The  SNRs  chosen  were  10  dB,  6  dB,  3  dB 
and  0  dB,  calculated  with  respect  to  the  entire  digitized  sentence  according  to 

SNR  =  20  log10  —  , 

0Vi 

where  as  and  an  are  the  Z2  norms  of  the  speech  and  the  noise,  respectively. 

The  spectrograms  of  the  clean  and  noisy  sentences  are  shown  in  Figures  54  and  55. 
These  figures  illustrate  the  usefulness  of  spectrograms,  which  are  described  more  fully  in 
Section  2.1.  Details  which  are  visible  in  the  spectrograms  of  clean  speech  are  masked  in 
the  noisy  speech.  As  the  noise  levels  increase,  more  detail  is  lost,  corresponding  to  greater 
difficulty  in  understanding  the  underlying  speech. 

For  the  cases  of  clean  speech  and  the  6  dB  SNR  speech,  approximations  with  one, 
two,  three,  six,  10,  and  15  poles  per  glottal  pulse,  for  window  decays  of  0,  50,  and  100, 
and  for  window  overlaps  of  0%,  50%  and  67%  were  found.  These  results  are  shown  in 
Figures  56  and  57.  Since  the  quantitative  results  did  not  vary  significantly  with  varying 
window  decay  rates  and  window  overlaps,  analyses  on  the  0  dB,  3  dB,  and  10  dB  SNR 
speech  was  done  only  for  window  decay  rates  of  0  and  window  overlaps  of  50%.  These 
results  are  shown  in  Figures  58  and  59.  Spectrograms  of  representative  and  illustrative 
approximations  are  shown  in  Figures  60  through  63. 


179 


(b)  10  dB  SNR 


(a)  Clean  speech 


Figure  55.  Spectrograms  of  clean  and  noisy  speech  ( sxl94 ).  (a)  Clean  speech.  Noisy 
speech  with  a  SNR  of:  (b)  10  dB,  (c)  6  dB,  (d)  3  dB,  and  (e)  0  dB. 


Poles 

per 

glottal 

pulse 

Compression 
ratio 
(sentence 
sal ) 

Compression 
ratio 
(sentence 
sxl  94) 

i 

24.0  :  1 

27.2  :  1 

2 

12.0  :  1 

13.6  :  1 

3 

8.0  :  1 

9.1  :  1 

6 

4.0  :  1 

4.5  :  1 

10 

2.4  :  1 

2.7  :  1 

15 

1.6  :  1 

1.8  :  1 

Table  2.  Compression  ratios  for  approximations  based  on  different  numbers  of  poles  per 
glottal  pulse. 

The  ratios  of  compression  achieved  with  each  each  approximation  are  shown  in  Ta¬ 
ble  2.  This  is  a  ratio  of  numbers  required  ,  not  a  ratio  of  bits  required.  This  ratio  was 
computed  by  dividing  the  number  of  samples  in  the  segment  by  twice  the  number  of  poles 
required  for  the  entire  approximation.  The  factor  of  two  accounts  for  the  fact  that  both 
the  identity  of  the  poles  being  used  and  the  coefficients  of  corresponding  to  these  poles 
must  be  retained.  The  locations  of  the  glottal  pulses  are  not  included  in  this  number. 

Rigorous  listening  tests  require  a  trained  panel  of  listeners,  which  was  not  available 
for  our  study.  For  this  reason,  the  only  extensive  listening  tests  performed  were  informal 
ones  by  one  untrained,  but  knowledgeable,  listener.  The  goals  of  these  tests  were  to  listen 
for  any  unexpected  auditory  phenomena  and  to  validate  results. 

7 .3.2  Discussion  of  results.  The  results  of  the  large-scale  analyses  are,  overall, 
encouraging.  Tests  show  that  approximations  based  on  the  frame  developed  in  Chapter  IV 
can  be  used  to  represent  speech.  Such  approximations  based  on  lower  numbers  of  poles 
per  glottal  pulse  have  inherent  noise  suppression  characteristics,  an  yet  are  able  to  well 
represent  fricatives  (best  described  as  colored  noise)  when  being  used  to  approximate  clean 
speech.  While  it  requires  a  significant  number  of  poles  to  get  a  good  approximation  to 
speech,  an  intelligible  approximation  is  achievable  with  far  fewer. 

The  results  are  presented  in  three  sections  below.  The  first  deals  with  numeric  results. 
These  results  are  analyses  of  the  L2  norm  differences  between  signals  and  approximations. 
The  next  section  deals  with  differences  visible  in  spectrograms,  which  reveals  some  of  the 
implications  of  the  algorithms.  The  numeric  and  spectrogram  results  must  be  considered 


182 


suggestive  until  verified  with  listening  tests.  The  results  of  some  rudimentary  listening 
tests  are  presented  in  the  last  section. 

Note  that  because  the  pole  selection  algorithm  is  a  heuristic,  a  better  heuristic, 
perhaps  one  based  on  physiological  considerations,  might  improve  the  results  presented 
herein. 


7.3.2. 1  Numeric  results .  Examining  Figures  56  and  57,  one  can  see  that  as 
the  number  of  basis  elements  in  the  approximation  increases,  the  difference  between  the 
signal  and  the  approximation  of  it  decreases  monotonically.  This  occurs  in  both  the  noisy 
and  clean  signal  approximations.  This  is  expected,  since  with  more  basis  functions,  the 
approximation  should  improve,  whether  or  not  the  original  data  was  clean  or  noisy.  The 
results  for  differing  analysis  window  decay  rates  and  overlaps  were  very  similar,  although 
some  audible  qualitative  differences  were  noted,  as  we  shall  discuss  below. 

Some  more  interesting  behavior  is  seen  in  these  figures  in  comparing  the  approxima¬ 
tion  of  the  noisy  signal  to  the  clean  signal.  For  the  cases  of  one,  two,  three,  and  six  poles 
per  glottal  pulse  in  the  approximation,  the  approximation  of  the  noisy  signal  is  closer  (in 
L2  norm)  to  the  clean  signal  than  to  the  noisy  signal  on  which  it  is  based.  For  the  cases 
with  10  poles  per  glottal  pulse,  sufficient  basis  functions  are  used  that  the  approximations 
of  the  noisy  signal  are  closer  to  the  noisy  signal  than  to  the  clean  signal,  which  is  what  one 
would  expect  with  an  increasing  number  of  basis  functions.  This  behavior  supports  the 
contention  that  this  representation  can  more  easily  represent  speech  than  most  non-speech 
(in  this  case,  Gaussian  white  noise). 

Figures  58  and  59  show  even  more  interesting  results.  As  with  the  previous  case,  the 
approximation  to  the  noisy  speech  is  initially  closer  (in  L2  norm)  to  the  clean  speech  than 
to  the  noisy  speech  on  which  it  was  based.  For  each  SNR  examined,  the  point  at  which  the 
approximation  to  the  noisy  speech  becomes  closer  to  the  noisy  speech  than  to  the  clean 
occurs  somewhere  between  six  and  10  poles  per  glottal  pulse. 

Of  more  importance  in  these  figures  is  the  point  at  which  the  approximation  to  the 
noisy  speech  most  closely  resembles  (in  an  L2  norm  sense)  the  clean  speech.  It  appears  that 
this  minimal  point  depends  on  the  level  of  noise,  with  the  SNR  and  the  number  of  poles 
in  the  minimal  approximation  being  positively  correlated.  This  result  has  some  important 
implications  to  speech  de-noising  techniques. 

An  additional  result,  also  expected,  seen  in  Figures  58  and  59  is  that  for  large  SNR, 
until  sufficiently  many  poles  are  added  to  the  approximation,  the  noisy  signal  is  a  better 


183 


approximation  to  the  clean  one  than  either  the  noisy  or  the  clean  approximation.  However, 
for  a  low  SNR,  even  an  approximation  (of  noisy  or  clean  signal)  based  on  one  pole  per 
glottal  pulse  is  closer,  in  L2  norm,  to  the  clean  signal  than  is  the  noisy  one. 

7 .3.2.2  Spectrogram  results.  Examining  the  spectrograms  in  Figures  60,  the 
first  thing  to  be  noted  is  that  increasing  numbers  of  poles  in  the  approximation  corresponds 
to  a  spectrogram  that  more  closely  represents  the  original.  With  one  pole  per  glottal 
pulse,  only  one  formant  is  clearly  picked  up.  With  two  poles,  two  formants  are  visible  in 
some  places  while  in  others  the  second  pole  contributed  towards  further  refinement  of  first 
formant.  Even  with  three  poles  per  glottal  pulse,  the  second  formant  is  not  completely 
filled-in  and  is  mostly  missing  where  it  was  of  low  amplitude.  This  clearly  identifies  an 
instance  in  which  further  refinements  of  the  pole-picking  heuristic  is  likely  to  produce  an 
audible  improvement  in  the  results. 

The  spectrograms  of  Figure  61  are  an  illustration  of  the  noise-suppressing  charac¬ 
teristics  of  these  approximations.  In  the  areas  with  a  strong  formant  structure,  the  first 
several  poles  of  the  approximation  go  towards  refining  the  formant  structure.  The  noise 
in  these  portions  of  the  approximation  do  not  become  apparent  in  the  spectrograms  until 
after  a  large  number  of  poles  has  been  used. 

An  especially  nice  example  of  the  effect  Gaussian  noise  has  on  fricatives  in  these 
approximations  can  be  seen  in  comparing  the  spectrograms  of  Figures  60  and  61.  Following 
the  development  with  additional  poles  of  the  fricatives  /JH/  (“j”  in  “enjoy”)  and  /SH/ 
(“t”  in  audition),  located  in  intervals  (0.47,0.53)  and  (1.62,1.72),  respectively,  one  can  see 
in  Figure  60  for  clean  speech,  even  with  one  pole  per  glottal  pulse  is  enough  to  begin  filling 
in  the  higher  frequency  noise  that  represent  the  fricative.  However,  for  approximations 
to  noisy  speech,  as  in  Figure  61,  the  colored  noise  of  the  fricative  is  not  well  represented, 
seemingly  being  masked  by  the  Gaussian  white  noise  added  to  the  signal. 

Figure  62  is  a  comparison  of  the  spectrograms  of  six  poles  per  glottal  pulse  approxi¬ 
mations  of  clean  and  noisy  speech  for  various  SNRs.  As  can  be  seen  here,  with  increasing 
levels  of  noise,  the  poles  in  the  approximation  are  less  likely  to  be  used  in  refining  the 
formant  structure  and  more  likely  to  go  towards  representing  the  noise. 

Figure  63  is  included  to  show  the  effects  of  the  amount  of  overlap  between  analysis 
windows.  In  particular,  comparing  the  spectrograms  of  the  approximations  to  the  spec¬ 
trogram  of  the  original,  particularly  in  the  areas  of  harmonic  speech,  one  can  see  vertical 
striations  marking  the  edges  of  the  analysis  windows.  These  striations  are  most  prominent 


184 


for  the  approximation  with  non-overlapping  windows  and  least  apparent  in  the  windows 
overlapping  by  67%.  It  is  thought  that  these  striations  may  represent  the  static-like  dis¬ 
tortion,  discussed  below,  reported  in  approximations  with  fewer  poles  and  non-overlapping 
analysis  windows. 

7 .3.2,3  Perceptual  results .  These  tests  are  qualitative,  not  quantitative,  in 
nature,  and  there  is  a  strong  subjective  component  to  the  results.  As  a  reminder,  these 
listening  tests  were  not  done  with  a  trained  panel,  but  rather  a  single  knowledgeable,  but 
untrained,  test  subject.  Also,  these  were  not  blind  tests  in  that  the  subject  knew  a  great 
deal  about  the  expected  results.  Therefore,  these  results  may  vary  somewhat  from  those 
with  other  test  subjects. 

General  quality.  General  listing  tests  were  performed  to  evaluate  the 
quality  of  the  approximations.  While  the  results  for  lower  numbers  of  poles  per  glottal  pulse 
were  generally  better  than  expected,  the  results  for  higher  numbers  of  poles  per  glottal 
pulse  were  slightly  disappointing  in  that  the  approximations  were  not  of  as  high  quality 
as  hoped.  Tests  on  varying  overlaps  between  analysis  windows  indicated  an  advantage  to 
fixed-size,  overlapping  windows  over  non-overlapping  windows. 

Surprisingly,  even  the  approximations  based  on  one  pole  per  glottal  pulse  sound 
speech-like.  While  a  naive  subject  would  not  be  able  to  understand  such  an  approxima¬ 
tion,  it  can  be  understood  if  one  knows  the  text  of  the  sentence.  Approximations  based 
on  two  poles  per  glottal  pulse  are  borderline  in  intelligibility,  and  those  based  on  three 
poles  per  glottal  pulse  are  definitely  intelligible.  However,  despite  being  intelligible,  these 
approximations  do  not  sound  good  because  of  distortions  introduced  by  the  linear  approx¬ 
imation  operation. 

A  surprising  result  was  that,  for  clean  speech,  a  single  pole  per  glottal  pulse  was 
sufficient  to  give  a  good  rendering  of  the  fricative  /SH /  in  the  sentence  $xl9J[.  This 
is  surprising  because  this  sound  has  very  little  harmonic  nature  and  is  best  described  as 
colored  Gaussian  noise.  However,  in  speech  to  which  noise  had  been  added,  approximations 
with  a  single  pole  per  glottal  pulse  do  not  capture  this  fricative  at  all.  Spectrograms  of 
some  of  the  approximations  where  it  is  apparent  are  shown  in  Figures  60  and  61,  where 
the  fricative  /SH/  occurs  during  the  time  duration  (1.62,1.71). 

Comparing  the  qualities  of  approximations  of  the  two  sentences,  the  listener  sug¬ 
gested  that  an  approximation  based  on  six  poles  per  glottal  pulse  of  the  sentence  sal 


185 


was  comparable  in  quality  to  an  approximation  based  on  10  poles  per  glottal  pulse  of  the 
sentence  sxl  94 .  Also,  an  approximation  of  sal  based  on  10  poles  per  glottal  pulse  was 
comparable  in  quality  to  one  of  sxl 94  with  15  poles  per  glottal  pulse.  The  reasons  for  this 
disparity  are  not  known. 

The  distortion  present  in  the  approximations  based  on  lower  numbers  of  poles  has  a 
tonal  quality,  like  random,  short  duration  tones.  The  descriptions  of  it  vary.  One  person 
said  that  it  sounded  like  a  child  playing  a  xylophone  in  the  background.  Another  said  that 
it  sounded  like  the  music  from  several  merry-go-rounds  playing  simultaneously.  Regardless, 
as  the  number  of  poles  in  the  approximation  increase,  this  distortion  is  reduced. 

In  approximations  of  noisy  speech,  with  increasing  poles,  the  distortion  is  reduced 
as  the  representation  of  the  noise  improves,  leading  to  a  trade-off  between  tonal  distortion 
and  accurately  reproduced  noise.  This  trade-off  will  be  discussed  in  more  detail  below. 

As  an  additional  note,  in  these  tests  there  was  a  difference  in  quality  of  the  distortion 
in  approximations  of  female  speech  (sal)  and  male  speech  (sxl94).  That  on  the  female 
speech  was  noted  to  be  more  “whistly”  and  that  on  the  male,  more  “garbly.”  Since  only 
two  speakers  were  used,  this  is  too  small  of  a  sample  to  generalize  this  observation.  Possible 
explanations  include  pitch  and  formant  location  differences,  with  the  female’s  voice  having 
both  higher  pitch  and  formants  in  higher  frequencies,  possibly  leading  to  the  distortions 
being  concentrated  in  a  higher  frequency  range. 

Quality  of  speech  segments.  The  listener  also  listened  to  short  seg¬ 
ments  of  the  approximations  to  determine  which  segments  were  most  easily  represented  by 
the  approximations. 

For  sentence  sal ,  the  segments  “wash  water”  and  “all  year”  were  captured  almost 
perfectly  in  the  six-pole  approximation,  with  the  10  pole  approximation  being  only  slightly 
better.  However,  of  the  tests  performed,  it  required  the  10  pole  approximation  to  get  a 
good  rendering  of  “dark  suit”  and  the  15  pole  approximation  for  a  good  rendering  of  “she 
had.” 

Segments  of  the  sentence  sxl 94  required  higher-pole  approximations  to  achieve  good 
quality.  The  segments  “they  en”  and  “joy  it”  (excerpted  from  “they  enjoy  it”)  had  notice¬ 
able  degradation  in  the  six  pole  approximation,  but  were  captured  fairly  well  in  the  10  pole 
approximation.  The  degradation  noticeable  in  the  six  pole  approximation  of  the  segment 
“dition”  (excerpted  from  “audition”)  is  not  as  noticeable  in  the  10  pole  approximation. 


186 


The  “garbly”  degradation  is  audible  in  both  the  six  pole  and  10  pole  approximations  of 
the  segment  “I  au”  (excerpted  from  “I  audition”). 

An  additional  perceptual  comment  was  that,  in  these  shorter  segments,  the  distortion 
in  the  approximations  was  often  not  objectionable.  In  longer  segments,  the  same  distortion 
became  annoying. 

Best  approximations  to  noisy  speech.  Figures  58  and  59  indicate  that 
for  a  given  level  of  noise,  there  is  a  best  number  of  poles  per  glottal  pulse,  such  that  the 
L2  difference  between  the  approximation  and  the  clean  speech  was  minimized.  Listening 
tests  were  performed  to  test  the  hypothesis  that  this  best  approximation  sounds  better 
than  the  other  approximations.  Comparisons  were  also  done  between  approximations  of 
speech  with  differing  noise  levels  with  similar  L2  differences  from  the  clean  speech. 

As  seen  in  Table  3,  the  results  of  the  listening  tests  comparing  approximations  of 
noisy  speech  follow  are  as  expected.  In  every  case  but  one,  the  approximation  identified 
as  sounding  best  was  the  closest  or  second  closest  in  L2  norm  to  the  clean  signal.  This 
implies  some  perceptual  validity  to  the  L2  norm  in  such  comparisons. 

As  can  be  noted  from  Figures  58  and  59  and  Table  3,  the  L2  norm  differences  between 
the  clean  speech  and  approximations  based  on  one  pole  per  glottal  pulse  of  both  clean  and 
noisy  speech  are  very  close.  Perceptually,  this  is  also  true,  with  the  one  pole  approximations 
being  deemed  of  equal  quality  regardless  of  the  noise  level  (of  those  used  in  our  tests)  of 
the  original  signal.  Note  that,  while  perceived  as  speech-like,  these  approximations  are  not 
intelligible. 

The  tests  comparing  approximations  with  similar  L2  differences  revealed  a  weakness 
of  using  the  L2  norm  to  predict  perceptual  differences.  The  three  pole  approximation  of 
the  3  dB  SNR  sal  was  judged  to  be  the  same  quality  as  the  two  pole  approximation  to 
the  10  dB  SNR  sal ,  which  was  expected.  However,  the  six  pole  approximation  to  the  10 
dB  SNR  sal  was  much  more  intelligible  than  the  one  pole  approximation  to  the  6  dB  SNR 
sal ,  despite  the  latter  being  slightly  closer  to  the  clean  speech  in  L2  norm.  This  shows  that 
even  among  the  approximations  used  here,  the  L2  norm  is  not  always  a  reliable  indicator 
of  perceptual  differences. 

Overlapping  analysis  windows.  Listening  tests  were  performed  to  de¬ 
termine  the  advantage,  if  any,  of  using  overlapping  analysis  windows.  It  was  found  that 


187 


there  is  little  advantage  to  using  overlapping  analysis  windows  when  working  with  approx¬ 
imations  of  clean  speech,  but  that  they  produce  better  sounding  results  when  working  with 
noisy  speech. 

In  clean  speech,  it  was  found  that  non-overlapping  windows  produces  an  approxima¬ 
tion  with  a  static-like  distortion,  which  was  reduced  with  increasing  numbers  of  poles  in  the 
approximation.  Overlapping  analysis  windows  produced  a  “talking  in  a  megaphone”  dis¬ 
tortion,  with  windows  overlapping  by  67%  having  more  distortion  than  those  overlapping 
by  50%.  Again,  this  distortion  decreased  with  increasing  numbers  of  poles  in  the  approx¬ 
imation.  So,  the  difference  between  overlapping  and  non-overlapping  analysis  windows 
appears  to  be  a  choice  between  types  of  distortion. 

For  noisy  speech,  however,  having  overlapping  analysis  windows  improved  overall 
quality.  For  example,  the  10  pole  approximation  to  the  sentence  sxl94  with  a  noise  level 
of  6  dB  SNR  was  deemed  awful,  while  the  50%  and  67%  overlapping  windows  produced  a 
much  better  result. 


188 


L2  difference  L2  difference  L2  difference 


0%  overlap,  exp(-Ot)  window  0%  overlap,  exp(-50t)  window  0%  overlap,  exp(-lOOt)  window 


Number  of  poles  used  Number  of  poles  used  Number  of  poles  used 


50%  overlap,  exp(-Ot)  window  50%  overlap,  exp(-50t)  window  50%  overlap,  exp(-100t)  window 


Number  of  poles  used  Number  of  poles  used  Number  of  poles  used 


Figure  56.  Normalized  error  for  speech  approximations  (sal),  for  clean  speech  and  6  dB 
SNR  Gaussian  white  noise-added  speech  for  varying  window  overlaps,  decay 
rates,  and  numbers  of  poles  used.  Data  points  denoted  “c”  mark  the  nor¬ 
malized  difference  between  the  clean  speech  and  the  approximation  to  clean 
speech.  Similarly,  “n”  marks  difference  between  noisy  speech  and  the  approx¬ 
imation  of  the  noisy  speech,  “d”  marks  the  difference  between  the  original 
clean  speech  and  the  approximation  to  the  noisy  speech. 


189 


L2  difference  L2  difference  L2  difference 


0%  overlap,  exp(-Ot)  window  0%  overlap,  exp(-50t)  window  0%  overlap,  exp(-lOOt)  window 


Number  of  poles  used  Number  of  poles  used  Number  of  poles  used 

50%  overlap,  exp(-Ot)  window  50%  overlap,  exp(-50t)  window  50%  overlap,  exp(-100t)  window 


Number  of  poles  used  Number  of  poles  used  Number  of  poles  used 

67%  overlap,  exp(-Ot)  window  67%  overlap,  exp(-50t)  window  67%  overlap,  exp(-lOOt)  window 


Number  of  poles  used  Number  of  poles  used  Number  of  poles  used 


Figure  57.  Normalized  error  for  speech  approximations  ( sxl94 ),  for  clean  speech  and  6  dB 
SNR  Gaussian  white  noise-added  speech  for  varying  window  overlaps,  decay 
rates,  and  numbers  of  poles  used.  Data  points  denoted  “c”  mark  the  nor¬ 
malized  difference  between  the  clean  speech  and  the  approximation  to  clean 
speech.  Similarly,  “n”  marks  difference  between  noisy  speech  and  the  approx¬ 
imation  of  the  noisy  speech,  “d”  marks  the  difference  between  the  original 
clean  speech  and  the  approximation  to  the  noisy  speech. 


190 


L2  difference  L2  difference 


(a)  10  dB  SNR 


(c)  3  dB  SNR 


(b)  6  dB  SNR 


(d)  0  dB  SNR 


n 

d 

c  •  . 

n 

...-■d - 

. d 

*.  d 

d  •" 

‘  n  .. 

c 

c  .. 

'  c  - .... 

“*•*  c  . 

*"*'■  n 

. c 

1  2 

3 

6 

10 

15 

Number  of  poles  used 


Figure  58.  Normalized  error  for  speech  approximations  (sal),  for  varying  SNR  Gaussian 
white  noise-added  speech  for  analysis  window  overlaps  of  50%,  window  decay 
rates  of  0,  and  varying  numbers  of  poles  used.  Data  points  denoted  “c”  mark 
the  normalized  difference  between  the  clean  speech  and  the  approximation 
to  clean  speech.  Similarly,  “n”  marks  difference  between  noisy  speech  and 
the  approximation  of  the  noisy  speech,  “d”  marks  the  difference  between 
the  original  clean  speech  and  the  approximation  to  the  noisy  speech.  The 
horizontal  line  marks  the  L2  difference  between  the  clean  and  noisy  signal  in 
each  case.  The  SNR  of  the  data  used  are  (a)  10  dB,  (b)  6  dB,  (c)  3  dB,  and 
(d)  0  dB. 


191 


L2  difference  L2  difference 


(a)  10  dB  SNR  (b)  6  dB  SNR 


1.2 

0.8 

2, 

0.6 

'% 

0.4 

0.2 

. d 

.  . . . n 

- —  c 

n 

0[ — • — * — * - * - - - 

1  2  3  6  10  15 


Number  of  poles  used 
(c)  3  dB  SNR 


(d)  0  dB  SNR 


Figure  59.  Normalized  error  for  speech  approximations  ( sxl94 ),  for  varying  SNR  Gaus¬ 
sian  white  noise- added  speech  for  analysis  window  overlaps  of  50%,  window 
decay  rates  of  0,  and  varying  numbers  of  poles  used.  Data  points  denoted  “c” 
mark  the  normalized  difference  between  the  clean  speech  and  the  approxima¬ 
tion  to  clean  speech.  Similarly,  “n”  marks  difference  between  noisy  speech 
and  the  approximation  of  the  noisy  speech,  “d”  marks  the  difference  between 
the  original  clean  speech  and  the  approximation  to  the  noisy  speech.  The 
horizontal  line  marks  the  L2  difference  between  the  clean  and  noisy  signal  in 
each  case.  The  SNR  of  the  data  used  are  (a)  10  dB,  (b)  6  dB,  (c)  3  dB,  and 
(d)  0  dB. 


192 


(a)  Original  clean  speech 


193 


Figure  60.  Spectrograms  of  approximations  of  clean  speech  (sxl9j).  Shown  are  the  spectrogram  of  (a)  clean  speech,  and  spec¬ 
trograms  of  the  approximations  with  (b)  one  pole,  (c)  two  poles,  (d)  three  poles,  (e)  six  poles,  (f)  10  poles,  and  (g) 
15  poles  per  glottal  pulse. 


(a)  Approximation  to  clean  speech 


(b)  Approximation  to  noisy  speech  (10  dB  SNR) 


(d)  Approximation  to  noisy  speech  (3  dB  SNR) 


(e)  Approximation  to  noisy  speech  (0  dB  SNR) 


Figure  62.  Spectrograms  of  approximations  of  speech  with  different  levels  of  Gaussian 
white  noise  sal).  Each  approximation  is  with  six  poles  per  glottal  pulse. 
Shown  are  the  spectrogram  of  (a)  the  approximation  of  clean  speech,  and  the 
spectrograms  of  the  approximations  of  noisy  speech  with  a  SNR  of  (b)  10  dB, 
(c)  6  dB,  (d)  3  dB,  and  (e)  0  dB. 


195 


(a)  Clean  speech 


0.5  1  1.5  2  2.5  3 

(b)  Approximation  with  0%  overlap 


0.5  1  1.5  2  2.5  3 


Figure  63.  Spectrograms  of  approximations  of  clean  speech  with  differing  window  over¬ 
laps  (sal).  All  approximations  are  with  six  poles.  Shown  are  (a)  clean  speech, 
and  approximations  based  on  (b)  0%,  (c)  50%,  and  (d)  67%  overlap  between 
analysis  windows. 


196 


Sentence  sal 

Poles 

per 

Clean 

10  dB 

6  dB 

3  dB 

0  dB 

glottal 

speech 

SNR 

SNR 

SNR 

SNR 

pulse 

1 

0.687702 

0.685619 

0.685046 

0.709586 

0.750175 

2 

0.464619 

0.474355 

0.492053 

0.548095 

0.635339 

3 

0.342174 

0.365598 

0.409262 

+  0.493598 

+*4  0.620351 

6 

0.197260 

0.270047 

4-*2  0.368374 

*3  0.502714 

0.692969 

10 

0.119157 

+*1  0.255672 

0.389134 

0.551639 

0.772717 

15 

+*  0.090360 

0.264063 

0.414692 

0.589132 

0.827508 

Sentence  sxl94 

Poles 

■■■■ 

per 

Clean 

10  dB 

3  dB 

0  dB 

glottal 

speech 

SNR 

SNR 

SNR 

pulse 

HUH 

i 

0.669009 

0.675625 

0.681489 

0.700375 

0.749570 

2 

0.520508 

0.532727 

0.550816 

0.593328 

0.681335 

3 

0.427633 

0.453797 

0.493363 

0.544308 

+*8  0.671220 

6 

0.290861 

0.350330 

*6  0.426135 

+*7  0.531198 

0.719564 

10 

0.198787 

*5  0.295894 

+  0.408046 

0.549081 

0.775857 

15 

+*  0.129461 

+  0.274082 

0.413903 

0.580043 

0.821027 

Listener’s  comments 

1  “10  poles”  slightly  better.  “15  poles”  has  too  much  noise. 

2  “6  poles”  significantly  better  than  “3  poles”;  better,  but  not  significantly,  than  “10 
poles.” 

3  “6  poles”  very  slightly  better  than  “3  poles,”  possibly  because  “6  poles”  sounds 
louder. 

4  “3  poles”  better  than  “2  poles.”  “6  poles”  has  too  much  noise. 

5  “10  poles”  slightly  better,  but  leaning  more  towards  “15  poles”  than  “6  poles.” 

6  “6  poles”  slightly  better,  but  leaning  more  towards  “10  poles”  than  “3  poles.” 

7  “6  poles”  noticeably  better  than  others. 

8  “3  poles”  slightly  better  than  others. 


Table  3.  L2  difference  between  clean  speech  and  approximations  of  clean  speech  and 
approximations  of  noisy  speech  with  varying  SNR.  The  approximation  preferred 
out  of  the  set  of  approximations  of  a  sentence  with  a  given  level  of  noise  is  marked 
by  the  symbol  For  easy  comparison,  the  approximations  with  the  lowest 
L2  norm  form  the  clean  speech  are  marked  with  the  symbol 


197 


7.4  Summary 

While  the  results  achieved  in  these  experiments  were  very  good  and,  in  general, 
matched  expectations,  there  were  some  unexpected  results  also. 

For  the  fine-scale  analyses,  it  was  found  that  short  segments  of  harmonic  speech 
are  represented  better  (lower  L2  error  with  fewer  poles)  than  similar  segments  of  non¬ 
harmonic  speech  and  non-speech-like  signals.  It  was  also  found  that  the  alignment  of  the 
speech  segment  within  the  window  made  a  noticeable  difference  in  performance  of  the 
representation,  with  the  worst  alignments  requiring  many  more  poles  to  achieve  the  same 
accuracy  as  for  the  best  alignments.  These  results  were  as  expected. 

The  medium-scale  analysis  produced  some  expected  and  some  unexpected  results. 
The  iterative  algorithm  used  to  find  the  frame  representations,  given  in  Theorem  4.3.4, 
was  found  to  converge  quickly,  with  only  negligible  improvements  after  5  iterations  - 
a  desired  result.  Examining  the  representations  of  harmonic  speech,  it  was  found  that 
there  is  sometimes  an  advantage  (lower  error  with  fewer  poles)  to  having  overlapping 
analysis  windows.  However,  comparing  the  representations  of  harmonic  speech  to  those  of 
signals  other  than  harmonic  speech,  some  disturbing  anomalies  were  observed.  First,  the 
accuracy  of  the  representation  of  reversed  speech  was  on  par  with  that  of  the  speech  in 
the  correct  order.  Second  and  more  disturbing,  a  distinctly  non-speech  signal  was  better 
represented  than  harmonic  speech.  Further  investigation  suggested  that  this  is  due  to  the 
uncomplicated  nature  of  the  Fourier  transform  of  this  non-speech  signal. 

While  the  results  of  the  fine-  and  medium-scale  analyses  were  suggestive,  the  results 
of  the  large-scale  analysis  are  the  most  meaningful  since  they  involved  listening  tests.  It 
was  found  that  approximation  in  this  representation  has  inherent  noise-suppression  charac¬ 
teristics  -  that  is,  the  speech  is  represented  more  naturally  than  the  noise.  Specifically,  low 
order  approximations  of  noisy  speech  were  found  to  be  both  numerically  and  perceptually 
closer  to  the  clean  speech  than  to  the  speech  corrupted  by  noise.  For  representations  of 
speech  corrupted  by  noise,  it  was  found  that  the  L2  norm  difference  between  the  repre¬ 
sentation  and  the  clean  speech  corresponded  well  with  the  perceptual  difference,  lending 
support  to  the  hypothesis  that,  when  using  this  frame  for  representations ,  the  L2  norm 
can  be  used  as  a  reliable  predictor  of  perceptual  differences. 


198 


VIII .  Conclusions 


This  work  presented  new  results  from  several  areas  of  mathematics,  applied  to  the 
problem  of  mathematical  speech  representation.  Representations  in  Hp( D)  were  used  to 
develop  frames  in  j ff2(D).  These  frames  were  then  used  in  the  construction  of  frames 
for  i2(R)  suitable  for  use  in  speech  processing.  A  generalization  of  the  frame  operator 
was  presented  which  allows  for  combinations  of  iterative  and  exact  solutions  of  frame 
representations  to  be  used. 

The  Carleson  inequality  proven  in  Theorem  3.1.4  allows  for  bounds  on  sums  of  sam¬ 
pled  values  of  elements  of  the  Hardy  spaces  Hp( D),  1  <  p  <  oo,  where  the  sampling 
locations  are  allowed  to  lie  in  closed  subsets  of  D  of  more  general  shapes,  rather  than  on 
the  simple  concentric  circles  in  D  in  previous  theorems.  This  relaxation  of  the  locations  of 
the  sampling  points  allows  this  theorem  to  be  used  to  create  more  versatile  representation 
theorems.  Theorem  3.6.1  is  an  example  of  such  a  representation  theorem  for  elements  of 
the  Hardy  spaces. 

Similar  to  mathematical  bases,  mathematical  frames  are  used  to  obtain  representa¬ 
tions  in  Hilbert  spaces.  Frames  for  H2(D)  were  found  in  Theorems  5.2.7  and  5.2.10  based 
on  Theorem  3.6.1.  These  frames  are  suitable  for  a  wide  variety  of  representation  applica¬ 
tions  in  D.  They  are  suitable  for  representation  in  other  spaces  via  transforms,  which  was 
how  they  were  used  in  this  work.  Exact  forms  of  projections  into  these  frames  are  also 
determined. 

A  composite  mathematical  frame  for  Z2(R)  was  developed  in  Theorem  4.2.4  which 
was  shown  to  be  useful  in  representing  digitally  recorded  speech.  It  should  be  particularly 
useful  in  signal  processing  applications  where  phenomena  of  varying  time  duration  and  of 
changing  characteristics  are  frequently  found. 

Of  particularly  use  in  a  wide  range  of  applications  far  beyond  those  of  this  paper 
is  the  generalization  of  the  frame  operator  presented  in  Theorem  4.3.1,  together  with  the 
iterative  method  for  finding  representations  given  in  Theorem  4.3.4.  This  provides  a  sound 
basis  for  combining  exact  and  iterative  methods  of  determining  frame  expansion  coefficients 
to  enable  faster  calculations. 

The  utility  of  the  mathematics  developed  here  was  shown  via  an  application  to  speech 
representation.  A  frame  especially  suitable  for  speech  representation  was  developed  based 
on  the  adaptive  T2(R)  frame  of  Theorem  4.2.4.  The  frame  for  Z2(R+)  used  as  a  building 
block  in  this  speech  frame  was  a  transform  of  an  jET2(D)  frame. 


199 


A  computer  program  was  written  which  could  adaptively  fit  a  composite  frame  to  a 
specific  example  of  speech  and  project  the  speech  onto  this  frame.  This  frame,  rather  than 
being  for  all  of  L2( R),  was  instead  for  a  finite-dimensional  subspace  of  L2(R),  allowing 
approximations  to  speech  to  be  found.  The  sampling  points  of  the  H2(D)  frame  used  in 
the  construction  were  chosen  such  that  they  corresponded  with  exactly  known  points  of 
the  Laplace  transform  of  the  windowed  signal  to  be  represented.  Tests  on  clean  and  noisy 
speech  and  non-speech  signals  showed  the  usefulness  of  the  constructed  frame  in  speech 
representation. 


200 


Appendix  A .  A  Space  of  Speech 


A.l  Abstract  Speech  Space 

Presented  in  this  section  is  an  abstract  metric  space  designed  to  contain  useful  rep¬ 
resentations  of  speech.  “Useful”  in  this  context  means  that  differences  as  measured  by  the 
metric  are  well  correlated  with  perceptual  differences  as  perceived  by  a  listener.  Addition¬ 
ally,  the  coordinates  of  this  space  have  some  physiological  meaning. 

A  A.l  Desired  characteristics  of  the  space.  We  want  an  abstract  metric  space, 
(S,d),  where  S  is  a  set  and  d  is  a  metric,  such  that  the  set  represents  speech  well.  The 
space  is  abstract  in  that  the  exact  makeup  of  the  set  5  and  the  metric  d  are  not  known, 
although  the  characteristics  they  must  satisfy  are  quantified.  Concrete  examples  of  this 
space,  where  the  makeup  of  the  space  is  fully  specified,  can  be  used  for  applications. 

Definition  A. 1.1  A  metric  defined  on  a  set  S  is  a  function  d  :  5  X  S  — ►  [0,oo)  such  that 

1.  d($u$2)  >  0  and  d(susi)  =  0  for  all  si,s2  G  S, 

2.  d{slls2)  =  0  iff  =  s2  for  all  sus2  E  S, 

3.  d(si,s2)  =  d(s2,Si)  for  all  si,s2  E  S,  and 

4.  d(su$3)  <  d(sus2)  +  d(s2,<s3)  for  all  sus2,s3  E  S. 

A  pseudo- metric  is  such  a  function  for  which  conditions  1 ),  3),  and  4)  hold \  and  a 
quasi-metric  (a.k.a  skew- metric)  is  such  a  function  for  which  conditions  1),  2),  and  4) 
hold. 

The  space  should  represent  speech  well  in  that  the  metric  d  will  provide  a  good 
quantifier  of  the  perceptual  differences  between  instances  of  speech  represented  in  the 
space.  The  set  S  should  be  as  small  as  possible,  and  yet  it  is  desirable  for  any  one-speaker- 
at-a-time  speech  to  be  exactly  representable  in  it.  Additionally,  it  is  desirable  that  the 
set  S  exhibit  mathematical  structure  (e.g.,  algebraic  structure,  convexity,  etc.)  Also,  the 
representation  should  be  “natural”  in  that  speech  is  more  compactly  represented  in  the 
space  than  non-speech. 

Because  speech  synthesis  researchers  have  been  enjoying  success,  a  logical  inspiration 
for  5  could  be  a  variant  of  the  speech  production  model. 

The  metric  or  pseudo-metric  d  may  actually  be  one  of  a  class  of  metrics/pseudo¬ 
metrics  defined  on  5.  This  map,  d,  should  be  able  to  quantify  (as  much  as  possible) 


201 


perceivable  differences.  This  will  make  d  suitable  for  “quality”  type  discriminations:  two 
sounds  that  are  very  close  under  d  will  be  perceived  by  a  human  listener  to  be  indistin¬ 
guishable. 

Many  spaces  will  meet  the  desired  characteristics.  Below,  an  abstract  speech  space 
is  developed  based  on  the  simple  speech  production  model  discussed  in  Section  2.1.2,  in 
which  the  constraints  that  must  be  met  are  discussed.  The  set  upon  which  this  space  is 
based  is  convex.  Elements  of  this  space  can  be  “produced”  into  speech  by  a  non-linear 
map,  S,  which  is  defined.  Concrete  speech  spaces  can  be  created  from  this  abstract  space 
by  further  quantifying  its  elements. 

A. 1.2  Description  of  the  Abstract  Speech  Space ,  (5,  d).  The  speech  space  defined 
here,  denoted  by  (5,  rf),  is  a  four- component  Cartesian-product  set  S  =  GxAxPxV 
with  an  associated  metric,  d .  The  component  set  G  contains  characterizations  of  the  glot¬ 
tal  excitation  source  shape,  A  contains  characterizations  of  the  amplitude  of  the  voice,  P 
contains  characterizations  of  the  fundamental  pitch  of  the  voice,  and  V  contains  charac¬ 
terizations  of  the  vocal  tract  response.  Each  of  the  component  sets,  G,  A ,  P  and  V,  is 
described  in  detail  below. 

Since  the  speech  production  process  is  finite  of  duration,  it  is  desirable  to  model  the 
interval  on  which  “speech  production”  happens  to  be  a  finite  interval  I,  although  the  effects 
of  speech  production  can  be  considered  to  be  infinite  in  duration,  if  so  desired.  Without 
loss  of  generality,  the  interval  I  may  be  assumed  to  be  I  =  [0,T]  for  some  terminal  time  T. 

This  space  is  related  to  the  simple  speech  production  model  (Figure  4)  in  the  fol¬ 
lowing  way.  The  system  response  of  the  linear  system  “vocal  tract”  is  represented  by  the 
component  set  V.  The  glottal  excitation  source  is  represented  by  the  three  component  sets 
G,  A,  and  P. 

Breaking  the  glottal  excitation  into  three  component  parts,  G,  A,  and  P,  effectively 
separates  the  excitation  into  three  components  for  which  there  exist  extensive  measure¬ 
ments  of  human  sensitivity  [16].  For  instance,  it  is  known  that  the  human  ear  cannot 
detect  small  pitch  changes  below  a  certain  threshold.  Therefore,  “small”  changes  in  the 
P- component  (“small”  to  be  quantified  in  the  development  of  the  metrics)  should  be  dif¬ 
ficult  to  detect  (i.e.,  should  not  degrade  quality).  Likewise,  “small”  amplitude  changes, 
represented  in  the  A-component,  should  be  difficult  to  detect.  Experience  with  linear 
prediction  coding  (LPC)  has  shown  that  the  human  auditory  system  is  not  extremely  sen- 


202 


sitive  to  small  changes  in  LPC  coefficients;  I  suspect  that  this  can  be  extended  to  “small” 
differences  in  a  vocal  tract  response. 

Under  the  assumption  that  “small”  changes  in  any  of  these  components  are  imper¬ 
ceptible  to  the  human  ear,  and  that  any  glottal  excitation  can  be  represented  with  them, 
it  seems  likely  that  good  metrics  could  be  developed  based  on  this  representation. 

This  decomposition  of  the  glottal  excitation  into  component  parts  is  related  to  that 
in  [26],  where  the  frequency  content  (which  is  closely  related  to  pitch)  is  manipulated 
separately,  and  is  similar  to  that  in  [17],  where  the  instantaneous  waveform  is  related 
in  concept  to  the  glottal  excitation  source  shape  used  here,  except  that  there  it  is  not 
normalized  for  pitch. 

A. 1.2.1  Glottal  excitation  shape.  For  voiced  speech,  the  glottal  excitation 
resembles  roughly  a  periodic  signal,  where  the  shapes  of  the  pulses  and  the  period  varies 
with  time.  Suppose,  however,  that  one  were  to  freeze  the  vocal  cord  and  air  flow  param¬ 
eters  at  time  /,  and  examine  the  glottal  excitation  produced  by  this  configuration.  This 
instantaneous  glottal  excitation  at  time  t  would  be  a  periodic  signal.  If  then,  one  were  to 
examine  how  the  instantaneous  glottal  excitation  varied  within  one  pitch  cycle  with  time, 
it  might  be  possible  to  put  constraints  on  the  rate  of  shape  change  that  would  model  the 
constraints  imposed  by  physiology. 

Based  on  this  idea,  I  define  the  glottal  excitation  source  shape  set,  G,  as  a  set  of  real 
valued  functions  on  I  x  R  which  are  periodic  with  period  one  in  the  second  coordinate.  For 
voiced  speech,  each  period  (in  the  second  coordinate)  represents  one  glottal  pulse,  with 
the  first  coordinate  representing  how  the  pulse  shape  is  changing  with  time.  For  unvoiced 
speech,  what  a  “period”  represents  is  less  clear,  but  the  turbulence  producing  the  unvoiced 
excitation  will  still  be  modeled  in  this  manner.  Because  it  is  desirable  for  the  speech  map, 
S,  to  be  defined  on  the  whole  space,  it  is  necessary  (due  to  the  energy  normalization  which 
will  be  seen  in  the  definition  of  S)  for  the  0  function  to  be  excluded.  At  the  same  time,  it 
is  desirable  to  maintain  convexity,  to  preserve  useful  space  structure  for  projections  into 
the  space.  Therefore,  the  set  G  will  further  be  constrained  to  functions  that,  for  all  t ,  have 
a  positive  mean  in  the  second  coordinate.  That  is, 

inf  [  g(t ,  r)  dr  >  0  . 

Wo  V  ' 

Also,  we  wish  to  add  some  constraints  which  reflect  physiological  realities  on  the  set  G. 
The  exact  constraint  will  vary  with  the  particular  instantiation  chosen.  However,  the 


203 


constraint  functional,  (j>G ,  must  be  convex  if  we  wish  the  subset  of  convex  set  G  defined  by 
{g  £  G  :  <f>G(g)  <  K}  to  also  be  convex,  where  K  is  some  positive  constant.  That  is, 

4>G{&gi  +  (l  -  <y)g2)  <  °^g(#i)  +  (l  ~  ®)4>G(g2)  V  <71,02  £  G7  0  <  a  <  1 . 

Since  it  will  be  necessary  to  energy-normalize  elements  of  G  to  “produce”  speech,  the  set 
G  will  have  to  be  restricted  to  functions  that  satisfy 

[  \g(t,  r)|2  dr  <  00  V  t  E  I . 

Jo 

Due  to  the  normalization  to  be  done  later,  this  may  be  further  restricted  to 

f  \g(t7r)\2  dr  <  1  V  t  €  I . 

Jo 

Additionally,  for  the  speech  map,  S,  to  be  well-defined,  it  will  be  necessary  to  bound  the 
elements  of  G.  That  is, 


sup  \g(t,r)\  <  oo  . 

(f,r)elx[0,l] 

The  set  G  can  be  defined  as 


G  =  {g:  I  x  R  — ►  E  |  g(t ,  r)  =  g(t ,  r+l)V(t,r)G  IxR, 

[  \g(t,T)\2  dr  <1  Vi  e  I, 

Jo 

4>a(g)  < 

sup  |<jf(i,r)|  <  oo,  and 

(t,r)elx[0,l] 

inf  [  g(t,  r)  dr  >  0}  . 

‘6i  Jo 

Figure  64(a)  shows  a  simplified  example  of  a  “voiced”  element  of  one  such  set.  To 
show  convexity  of  the  set  G,  the  following  proposition  is  necessary. 

Proposition  A. 1.2  The  set  G,  as  defined  above,  is  a  convex  subset  of  L2(l  X  [0, 1]). 

Proof.  By  its  construction,  G  is  convex.  Also,  \g{t,r)\2  dr  <  1  for  all  i  €  I  and 
I  compact  implies  g  £  Z2( I  x  [0, 1]).  □ 


204 


Figure  64.  (a)  A  simplified  example  of  a  time  varying  glottal  excitation  shape,  (b)  A 

simplified  example  of  a  time- varying  impulse  response. 

A.  1.2. 2  The  amplitude,  A.  To  represent  the  time  varying  amplitude  of 
speech,  another  coordinate,  A,  is  necessary.  Clearly,  since  the  human  voice  is  amplitude 
and  energy  limited,  it  is  possible  to  bound  the  amplitude  above  by  amax  and  below  by 
0.  Also,  as  with  the  set  G  above,  we  will  want  to  apply  some  convex  measure  of  rate  of 
change  to  elements  of  A,  denoted  (fA . 

The  set  A  can  be  represented  as 

A  =  {a  E  L2( I)  :  0  <  a(t)  <  amax  Vie  I,  4>a{o)  <  Ka}  - 
By  its  definition,  it  is  clear  that  A  is  a  convex  subset  of  T2( I). 

A. 1.2. 3  The  fundamental  pitch ,  P.  The  perceived  pitch  of  the  human 
voice  is  usually  the  pulse  repetition  rate  of  the  glottal  pulses  (for  voiced  speech).  The 
pitch  coordinate,  P,  represents  the  time  varying,  instantaneous  pitch  of  the  speech.  For 
unvoiced  speech,  this  value  is  less  intuitively  definable,  but  the  pitch  component  will  still 
be  used.  As  with  the  amplitude  A,  the  pitch  may  be  bounded  above  and  below  by  pmax 
and  0,  respectively,  and  bounded  by  a  suitably  chosen  rate  of  change  functional,  (f>P . 

The  set  P  can  be  represented  by 

P  =  {p  €  L2( I)  :  0  <  p(t)  <  pmax  Vie  I,  4>P(p)  <  KP }  . 


205 


Again,  by  its  definition,  it  is  clear  that  P  is  a  convex  subset  of  Z2( I). 

A.  1.2. 4  The  vocal  tract  response,  V.  The  vocal  tract  response,  V ,  is  the 
time- varying  system  response  of  the  vocal  tract.  The  “system  response”  can  be  simply 
defined  as  the  response  of  the  system  to  an  impulse  (Dirac  delta)  at  some  time.  Since  this 
is  a  time  varying  system,  the  system  response  depends  on  the  time  at  which  the  impulse 
occurs  as  well  as  the  time  at  which  the  system  output  is  examined. 

In  this  model,  for  v  E  V,  v(t ,  r)  represents  the  response  of  the  system  (vocal  tract)  at 
time  t  to  an  impulse  at  time  r.  Since  the  vocal  tract  is  being  modeled  as  a  causal  system, 
v(t,r)  =  0  for  all  t  <  r.  Also,  as  with  the  set  G,  the  impulse  response  must  have  finite 
energy  for  each  time  t  and  there  must  be  a  convex  functional  <fv  to  govern  the  acceptable 
rate  of  change. 

The  set  V  can  be  defined  as 

V  =  {u:  E  X  I  — ►  R  |  v(t,  t)  =  0  V  t  <  r 

/oo  1 

\v(t,r)\2dt  <  -  Vr  G  I 
4>v{v)  <  Kv)  • 

Figure  64(b)  shows  an  example  of  such  a  time- varying  impulse  response  from  a  can¬ 
didate  set.  Again,  to  show  convexity  of  the  set  5,  the  following  proposition  is  required. 

Proposition  A. 1.3  The  set  V ,  as  defined  above ,  is  a  convex  subset  of  i2(R  X  I). 

Proof.  By  its  construction,  V  is  convex.  Also,  /r°°  |u(/,r)|2  dt  =  /0°°  |u(/,r)|2  dt  < 
Y  for  all  r  G  I  and  I  compact  implies  v  E  L2( E  X  I).  □ 


A.  1.2. 5  Mathematical  representations  of  physiological  constraints  on  rate  of 
change .  Clearly,  there  are  physiological  constraints  on  how  fast  the  components  of  the 
vocal  tract  can  change,  which  should  translate  into  constraints  on  the  elements  of  each  of 
the  coordinate  sets,  G,  A ,  P,  and  V.  Depending  on  the  exact  set  5  chosen  for  instantiation, 
these  constraints  can  be  specified  in  a  number  of  ways.  As  discussed  above,  however,  these 
constraint  functionals  should  be  convex  (on  the  convex  set  on  which  they  are  defined). 
That  is,  for  X  E  {G,  A,  P,  V}, 

<MaaT  +  (1  “  a)x2)  <  a<£x(zi)  +  (1  -  a)<f>x(x2)  V  xu  x2  E  X,  0  <  a  <  1  . 


206 


One  example  of  a  class  of  suitable  functionals  would  be  the  norms  defined  on  the 
Sobolev  spaces,  which  are  defined  in  R  by  [1] 


\m,p 


<*= 0 

|*||m,oo  =  max  || Dax 

'  0<a<m  11 


1/P 


1  <  p  <  oo 


(74) 


where  D°x(t)  —  x(t ),  Dax(t)  =  and  ||  •  ||p  denotes  the  norm.  These  are  easily 

shown  to  be  convex  functionals.  However,  they  require  differentiability,  which  may  not 
be  desirable  in  every  case.  Possibly  more  suitable  are  subsets  of  the  class  of  Lipschitz 
functions  [7],  in  that  rates  of  change  of  functions  in  these  classes  is  bounded  without 
requiring  in  that  differentiability. 


Definition  A.  1.4  Let  f:  X  — ►  Y  and  let  p  and  q  be  metrics  defined  on  X  and  Y,  respec¬ 
tively .  The  function  f  is  called  a  Lipschitz  function  if  there  exists  a  constant  M  >  0  such 
that  q(f(x),f(y))  <  Mp(x,y)  for  all  x,y  G  X. 

If  one  were  to  consider  the  subset  of  the  set  of  Lipschitz  functions  defined  by  setting 
a  maximal  value  for  M  (called  the  Lipschitz  constant)  in  the  above  definition,  perhaps  this 
subset  would  be  of  use.  In  addition,  there  are  many  other  suitable  functions.  The  correct 
choice  depends  on  the  particular  sets  chosen. 

A.  1.2. 6  Properties  of  the  set  S.  The  following  proposition  can  be  easily 

proven. 


Proposition  A.1.5  Let  A  and  B  be  Hilbert  spaces  with  inner  products  {•<,•) a  and 
respectively.  Then,  the  Cartesian  space  AxB  is  a  Hilbert  space  with  inner  product  (*,  -)axb 
defined  by 


((/a?  /#)?  (9a,9b))axb  -  +  (fB,9B)B  • 

Proposition  A. 1.6  The  set  S  =  G  X  Ax  P  xV ,  as  defined  above ,  is  convex  subset  of  the 
Hilbert  space  L2(l  X  R)  X  L2( I)  x  L2( I)  x  L2(R  x  I). 

Proof.  By  its  construction,  5  is  indeed  a  subset  of  the  Hilbert  space  in  question. 
The  convexity  of  S  is  easily  seen  by  the  convexity  of  its  component  sets.  □ 


207 


A.  1.2. 7  Speech  production  from  elements  of  S.  The  “speech  production 
process”  used  herein  is  a  non-linear  map,  S:5  — ►  L2{ R+).  Since  all  speech  is  of  finite 
energy  (finite  support  and  finite  amplitude),  the  set  of  all  speech  is  a  subset  of  Z2(R+). 

The  action  of  S  on  each  element  of  5  is  defined  as 

/•min  {t,T} 

S[sr,o,p,t?](<)  =  /  x[g,a,p](T)v(t,T)d,T , 

Jo 

where  the  glottal  excitation  x:G  x  A  X  P  — *•  X2( I)  is  defined  by 


x[g,a,p](t)  =  a(t)iVG[5](t,u;[p](t)) , 


(75) 


where  iVG:  G  — »•  X2(I  x  R)  is  defined  by 


Na[g](t,T) 


ItfW.OW'2’ 


and  w:P  — ►  X2(I)  is  defined  by 


w[p](t)  =  f  p(T )  dr  . 

Jo 

Observe,  w[p]  is  non-decreasing,  since  p  is  nonnegative. 

Before  operator  S  is  shown  to  be  well-defined,  some  preliminary  results  are  proven. 
All  norms  of  the  form  ||  •  \\x  are  L 2  norms  on  the  indicated  set,  X. 


Lemma  A. 1.7  For  every  g  E  G,  sup^  r)6iX[0)i]  Mvc[ff](*ir)l  <  00. 
Proof.  Recall  mitei g(t:  r)  dr  >  0  and  define  e  by 


=  inf«€i/o  9(t,T)dr 

<  Jo9(*»T)(ir 

<  | Jq1  r)|2  ^r| 


This  gives 


sup  \Na[g](t,r)\ 

(«,r)6  lx[0,l] 


sup 

(<,r)6lx[0,l] 


|i?|S(<,C)|2dCl1/a 


208 


< 


sup 

(t,r)elx[0,l] 

=  -  sup  \g(t,r)\  <  00. 

f  (<,r)elx[0,l] 


g(t,T 


□ 


Lemma  A. 1.8  The  map  x:G  X  A  X  P  ^  L2(l),  as  defined  in  (75),  is  well-defined. 

Proof.  Fix  ( g,a,p )  6  G  x  Ax  P .  By  definition  of  A,  a  6  A  implies  |a(t)|  <  KA  for 
all  t  G  I.  Also,  g  6  G  implies  sup(t iT)6ix[0i1]  |-/Vg [</](<, t)|  —  M  <  oo.  Therefore, 

|a(f)iVG[fir](/,r)|  <  MKA  V  (t,r)  €  lx  [0,1] , 


which  implies 


=  T1/2MKa 


<  OO  . 


Therefore,  x[gya,p ]  G  i2(I)  for  all  (<7,  a,p)  G  G  X  A  x  P. 


□ 


We  are  now  prepared  to  prove  the  main  theorem. 


Theorem  A. 1.9  The  map  S:  S  — ►  L2( R),  as  defined  above ,  is  well-defined . 

Proof.  What  we  must  show  is  that  for  all  (g,  a,p,  u)  G  5,  we  have  s  ==  S[g,  a,p,  v]  G 
i2(E+)  Recall,  vGf  satisfies 


/  |u(f,r)|2df 
,71+ 


< 


yVrel. 


(76) 


Hence, 


[  [  \v(t,r)\2  dt  dr  =  f  \v(t,r)\2  d(t,r)  =  ||v||fxl  <  1, 

JlJ  1+  Jixl+ 

by  Fubini’s  Theorem.  Since 

s(t)  =  f 
Jo 


rm\n{t  ,T} 


x(r)u(f,r)  dr  , 


209 


we  have  that 


/■mi  n{t,T} 

*(t) I  <  /  l®(T)llv(^r)|  dr 

Jo 

<  JjX(T)\\v(t,T)\dT 

<  (jfl*(0|2  dT)  ^ 

\  1/2 

K*>r)l2  dTj 


which  gives 

||S[p,a,p,  v]||i+ 


with  another  application  of  Fubini’s  Theorem.  Therefore,  S [g,a,p,v\  G  L2(R+)  for  all 
(g,a,p,v)e  s.  □ 

Note  that  for  any  particular  space,  it  is  possible  that  not  all  examples  of  speech  can 
be  represented.  However,  for  any  example  of  speech,  there  exists  some  specific  space  from 
which  it  can  be  produced.  Also,  unless  carefully  constructed,  any  given  space  may  contain 
multiple  representations  of  the  same  example  of  speech.  This  is  undesirable,  since  then 
any  metric  on  the  space  will  show  equivalent  elements  to  be  dissimilar  sounding.  If,  for 
some  reason,  such  a  space  is  useful  despite  the  multiple  representations,  it  will  probably 
be  necessary  to  work  with  a  pseudo-metric  instead  of  a  metric. 

A, 1.2. 8  The  class  of  metrics,  d.  One  way  to  define  a  metric  on  a  Cartesian 
product  of  sets  is  to  base  it  on  metrics  of  the  component  sets.  That  is,  an  obvious  family 
of  metrics  for  S  is  given  by 

d((gi,a1,p1,v1),(g2,a2,P2,v2))  =  OLGdG{gl,g2)  +  aAdA(a1,a2) 

+aPdP(p1,p2)  +  avdv(v1,v2 )  , 

where  dG ,  dA ,  dp ,  and  dv  are  metrics  on  G,  A,  P,  and  V ,  respectively,  and  aG,  aA,  ap,  av  > 
0  are  weighting  factors.  Such  a  function  is  clearly  a  metric  on  the  set  S. 


=  ([  KOI2)  <  \\x\\i([  l \v(t,r)\2  drdt 

\J m+  /  \J i+  J  i 


P  i  P  ixi 


<  P 


210 


Note  that  the  energy  normalization  done  in  the  speech  production  process  on  the  G 
and  V  components  effectively  defines  equivalence  classes  within  the  set  5.  Therefore,  a 
suitable  pseudo-metric  might  be  given  by 

dp,((gua1,pi,Vi),(g2,a2,P2,v2))  =  aGdG(NG[g1],  NG[g2})  +  aAdA(aua2) 

+cnpdp(pi,p2)  +  aydv(Nv[vi],  Nv[v2])  ■ 

A.  1.2.9  Space  short  comings  and  additional  work  needed.  This  model  is 
suitable  for  characterizing  speech  where  the  excitation  is  solely  from  the  area  of  the  vocal 
cords.  This  describes  vowels  and  some  consonants  (e.g.,  1,  r,  y)  well,  but  does  not  describe 
the  important  classes  of  fricatives  (e.g.,  s,  sh,  z,  f)  or  plosives  (e.g.,  p,  b,  t).  To  handle  these 
sounds  in  a  natural  way  would  require  a  Cartesian  space,  5,  with  additional  coordinates, 
and  would  be  less  mathematically  tractable. 

One  approach  is  to  assume  no  more  than  one  additional  constriction  (the  source 
of  turbulence  for  fricatives).  Then  consider  the  impulse  responses  before  and  after  the 
constriction,  with  the  glottal  excitation  affected  by  both  impulse  responses  and  the  addi¬ 
tional  turbulence  only  affected  by  the  second.  For  a  time-invariant  system,  this  could  be 
represented  mathematically  by 

s(t)  =  (f[x1  *  Vi]  *  V2)(i )  , 

where  Xi  is  the  glottal  excitation,  Vi  and  v2  represent  the  vocal  tract  responses  before  and 
after  the  constriction,  respectively,  and  /  is  a  nonlinear  function  which  changes  some  of  the 
signal  from  before  the  constriction  into  “turbulence”  (i.e.,  wideband  noise).  This  model 
collapses  to  the  simple  model  discussed  previously  for  the  appropriately  chosen  /  and  v2- 

A. 2  Linear  System  Background 

Much  work  has  been  put  into  modeling  the  vocal  tract  as  a  linear  system.  Most 
approaches  use  the  frozen  time  approximation  to  the  time-varying  system  which,  while 
a  good  approximation  for  most  portions  of  speech,  is  inappropriate  for  rapid  transitions. 
These  areas,  however,  are  often  extremely  important  in  making  determinations  of  identities 
of  consonants.  Therefore,  it  is  appropriate  to  examine  approximations  which  are  based  on 
time- varying  systems. 


211 


The  work  presented  in  this  section  is  closely  related  to  the  speech  space  presented  in 
Section  A.l  through  the  mapping  S.  By  fixing  the  components  g ,  a,  and  p,  this  mapping 
can  be  considered  to  be  a  linear  operator  on  v(t,  •)  G  L2(R+). 

Section  A. 2.1,  below,  will  review  linear  systems  briefly  and  introduce  the  class  of 
linear  systems  which  we  will  be  using  to  represent  the  speech  process.  Approximations  to 
this  class  of  linear  system  will  also  be  discussed. 

Section  A. 2. 2  presents  speech  as  the  product  of  a  time- varying,  linear  system.  Based 
on  some  of  the  characteristics  of  speech,  a  “r-varying  poles”  model  of  speech  is  presented. 

A. 2.1  Linear  Systems.  This  section  is  an  introduction  to  linear  systems,  as  we 
intend  to  use  the  term.  The  general  (time-varying)  case  will  be  discussed,  as  well  as  the 
special  case  of  time  invariant  systems. 

A. 2. 1.1  General  Linear  Systems.  Because  the  work  presented  here  is  of  a 
mathematical  nature,  linear  systems  will  be  presented  as  mathematical  entities  to  facilitate 
future  discussion. 

Definition  A. 2.1  Let  X  and  Y  be  linear  spaces  over  the  field  F.  A  linear  mapping  is  a 
mapping  A:X  — *•  Y  for  which 


A[alx1  +  a2x2]  =  alAxl+a2Ax  2 


for  all  Xi,x2  G  X}  ol\^ol2  G  F. 


Definition  A. 2. 2  Let  Rx  and  Ry  be  linear  spaces  over  R  and  let  X  be  a  linear  space 
of  functions  of  the  form  x:  R  — ►  Rx  and  let  Y  be  a  linear  space  of  functions  of  the  form 
y:  R  — >  Ry .  Let  A:X  — »  Y  be  a  linear  mapping.  Then ,  the  equation 

Ax  —  y 

describes  a  linear  system  with  “input”  x  G  X  producing  “output”  y  G  Y .  Such  a  linear 
system  is  called  causal  if  whenever  x(t)  —  0  for  all  t  <  r,  Ax(t)  =  0  for  all  t  <  r.  Such  a 
linear  system  is  called  time  invariant  if 

Ax(t  +  r)  =  AxT(t) 


212 


for  all  /,  r  £  R  where  xT(t)  =  #(2  +  r).  All  linear  systems  which  are  not  time  invariant  are 
called  time- varying  linear  systems . 

A. 2. 1.2  Analysis  of  Linear  Systems.  Given  a  linear  system,  there  are  many 
questions  which  may  be  asked.  For  example,  one  may  wish  to  examine  the  relation  between 
input  x  and  output  y  for  a  given  system,  5.  While  (in  general)  y  can  be  easily  determined 
for  a  given  x,  the  opposite  is  generally  not  true.  More  difficult  yet  is  when  neither  x  nor  S 
is  known  with  much  precision,  and  yet  some  best  S  and  x  must  be  found  for  some  known 
y.  This  describes  the  process  of  analyzing  recorded  speech. 

Below,  three  specific  examples  of  linear  systems  are  given.  In  each  case,  the  mapping 
A  is  representable  as  an  integral. 

Impulse  Response.  Suppose  that  for  a  given  linear  system  described 
by  the  linear  mapping  A:  X  — ►  Y,  one  has  an  input  x  and  wishes  to  know  the  output  y. 
This  would  be  given  by 


y  =  Ax  . 

One  common  way  to  solve  this  problem  is  to  solve  for  the  impulse  response  of  the 
system.  That  is,  determine  v  (if  possible)  such  that 

Ax(t)  =  /  v(t^r)x(r)  dr  , 

Jo 

where,  for  simplicity,  A  is  assumed  to  be  causal  and  to  produce  0  output  before  time  0. 

Transfer  Function.  In  a  time-invariant  linear  system,  the  impulse 
response,  v(t,  r),  can  be  simplified  to  a  function  ft,  defined  by  h(t  —  r)  =  v(tyr)  where, 
due  to  the  time-invariant  nature  of  the  system,  this  relation  holds  true  for  any  choice  of 
r  G  R+. 

The  output  from  such  a  system  is  given  by 

y(t)  =  f  v(t)  t)x(t)  dr 
Jo 

=  f  h(t  —  r)x(r)  dr 

Jo 

=  [h  *  a;](f)  . 


213 


Using  the  fact  that 


[/&*a?](f)  =  C  1[£[h]C[x]](t)  , 

where  C  represents  the  well  known  Laplace  transform,  this  gives 

y(t)  =  C-'[HX](t)t 

where  H(s)  =  C[h](s)  and  X(s)  =  £[#]($).  The  function  H  is  often  called  the  transfer 
function  of  the  linear  system. 

Zadeh’s  System  Function.  Zadeh’s  system  function  [8]  can  be  thought 
of  as  a  transfer  function  for  time- varying  linear  systems.  A  brief  development  of  it,  showing 
its  similarity  in  concept  to  the  transfer  function  is  given  here. 

Assume  that  a  linear  system  describable  by  a  mapping  A:  X  —rY  has  associated 
with  it  an  impulse  response  v .  The  output  y  from  this  system  for  a  given  input  x  is  given 
by 


/oo 

x(T)v(t,T)d,T  , 

-oo 

which,  assuming  causality  and  (for  simplicity)  that  the  system  produces  only  0  output  for 
t  <  0,  becomes 

y(t)  =  [  x(r)v(t,t  -  r)dr  , 

Jo 

where  v  is  defined  by  v(t,r)  =  v(t,t  —  r).  Then,  for  each  t ,  y(t)  is  given  by 

y(t)  =  [  x(r)vt(t  -  r)  dr 

Jo 

=  [x*  vt](t) 

where  £*(•)  =  v{t^  •).  Applying  the  Laplace  transform,  we  find 

y(t)  =  £-l[XZt\{t) 


214 


where  X  and  Zt  are  the  Laplace  transforms  of  x  and  vt,  respectively.  Defining  a  function 
Z:T  x  C+  —*■  C  by  Z(t,s )  =  Zt(s),  we  have  Zadeh’s  system  function,  given  by 


Z(t,s)  =  Zt(s)  =  £[v<](s) 

poo 

=  /  v(t,Z)e-3(d£ 

Jo 

poo 

=  /  v(t,t  -  £)e—(  d{ 

Jo 

=  f  v(t,T)e~s^~T^  dr  . 

J  —  OO 

The  last  form  is  the  one  usually  given  as  Zadeh’s  system  function.  This  gives  for  y, 

y(t)  =  £-'[X(.)Zt(.)](t) 

=  £-'[X(.)Z(t,.)](t). 

This  last  form  shows  the  similarity  in  concept  to  the  transfer  function  used  for  the  time- 
invariant  case. 

A. 2. 1,3  Linear  Systems  Describable  by  Integral  Transforms.  In  Section  A.l, 
a  mapping  5  was  introduced  that  maps  elements  of  a  speech  space  into  “speech”  (that  is, 
the  Z2(R+)  representation  of  speech).  This  mapping  can  also  be  considered  an  element 
of  the  class  of  linear  systems  described  below.  This  class,  which  I  believe  will  be  of 
particular  use  in  representing  the  speech  process,  is  describable  by  the  linear  operator 
5U:X2(R+)  — >  £2(R+),  where  Sv  is  defined  by 

Svx(t )  =  f  x(r)v(t,r)  dr  , 


where  v  6  V  and 


V  =  {v  £  L2( R+  x  R+)  :  v(t ,  r)  =  0  V  t  <  r}  . 

In  words,  the  set  V  represents  a  rather  large  set  of  causal,  time- varying,  impulse  responses. 
Theorem  A. 2. 3  The  operator  SV:L2( R+)  — »  i2(R+);  as  defined  above ,  is  well-defined. 


215 


Proof.  It  is  sufficient  to  show  that  ||S„x||  <  oo  for  arbitrarily  chosen  v  £  V  and 
x  €  X.  First, 


Next, 


\Svx(t)\2  = 


dr 


t)|  dr 


[  v(t,T)x(r) 

Jo 

<  (Jo  K<*^)ll®( 

<  f  \v(t,T)\2dr  [  |z(r)|2dr 

Jo  Jo 

f\v(t,r)\2 
Jo 


<  \\x 


■dr. 


<  llxl|2  /  /  \v(t,T)\2  dr  dt 

J  0  Jo 

<  NI2IMI2- 


Since  this  is  true  for  any  chose  of  T  E  E+,  we  have 

/•OO 

/  \Svx(t)\2dt  <  ||ar||2||i;||2  <  oo  . 

Jo 


□ 


Corollary  A. 2. 4  For  all  v  E  V,  Sv  is  a  bounded  linear  operator . 

Proof. 


lis 


sup 

M*o 


||SBa; 

INI 


< 


»!!M 

11*11 


=  INI  • 


□ 


By  varying  v  €  V,  we  may  define  a  set,  X,  of  linear  systems  indexed  on  v.  That  is, 

X  =  {5V:  X2(R+)  — >  X2(R+)  :  v  G  V,  Svx(t)  =  [  x(r)v(t,r)  dr}  . 

Jo 

This  is  a  very  large  set  of  linear  systems.  In  fact,  it  will  be  shown  that  the  closure 
of  the  set  of  ranges  of  operators  in  this  set  is  all  of  X2(  1^+)*  The  significance  of  this  with 


216 


regard  to  speech  is  that  any  example  of  speech  can  be  produced  by  some  such  operator, 
assuming  (quite  reasonably)  that  speech  is  an  element  of  £2(R+). 


Let  7 ZL  =  7£(S„),  where  7l(Sv)  denotes  the  range  of  the  operator  Sv.  Define  E 

by 


E  =  {s  £  £2(R+)  :  s(t)  =  0  for  t  £  [0,  e]  for  some  c  >  0}  . 


Lemma  A. 2. 5  E  =  £2(R+). 

Proof.  By  construction,  E  C  Z2( R+).  Therefore,  to  conclude  the  proof,  all  that 
is  necessary  is  to  show  that  for  /  £  £2(R+)  arbitrarily  chosen,  there  exists  a  sequence 
{fn}n=n  fn  G  E,  such  that  /„  /. 

Given  /  £  £2(R+),  construct  {/n}£°=i  according  to 


fn(t) 


'  m  *  >  i 

0  otherwise 


Clearly,  fn  E  E  for  each  1  <  n  <  oo. 

To  show  fn  —>  /,  it  is  only  necessary  to  note  that  |  f(t)  -  fn(t) |2  converges  pointwise 
almost  everywhere  to  0  on  the  interval  [0, 1],  that  —  fn(t)\2  is  bounded  above  by  |/(/)|2, 
and  that 

[  \f{t)-fn{t)\2dt  =  [  \f(t)\2dt->0, 

Jo 

as  n  — >  oo.  Therefore,  fn  — >  /,  and  hence,  /  E  E.  □ 


Lemma  A. 2. 6  E  C  1ZL. 

Proof.  Let  s  E  E.  Thus  there  exists  e  >  0  such  that  s(t)  =  0  for  t  E  [0,c].  To 
complete  the  proof,  we  exhibit  an  x  E  i2(^+)  and  v  E  L2(l^+  x  R+)  such  that  s  —  Svx. 

Define  x  by 


0  <  t  <  € 
otherwise 


0 


217 


and  define  v  by 


v(t,r)  =  V!(t)v2(r) 


where 


Vi  (t)  =  s(t ) 


and 


u2(0 


7?  0  <t  <  € 

0  otherwise 


Clearly  x  6  X2(M+).  Examining  ||v||,  we  see 


=  /  Kt,r)|26f(t,r) 

=  /  ki(0v2(r)|2rf(t,r) 

J]E+Xlv  + 

noo 

bi(0l2!v2(r)|2rfr  ^ 

_ 

yOO  yOO 

=  /  KWP  /  Mr)|2(ir(ft 

Jo  Jo 

yoo  ye  1 

=  /  |vi(t)|2  /  -  dr  dt 

Jo  Jo  € 

yOO 

=  /  MOI2* 

Jo 

y  CO 

=  /  K0IS 

Jo 


5  <  OC  . 


Therefore,  v  E  T2(M+  X  R).  We  now  show  that  5  =  5yar.  We  have 

5V#(;£)  =  /  a;(r)v(<,  r)  dr 

Jo 

=  vi(t)  J  x{r)v2{r)dr  . 


218 


Examining  the  case  where  t  6  [0,  e]  we  see  that  Vi(t )  =  s(t)  =  0,  and  so  Svx(t)  =  0  =  s(t). 
Examining  the  case  where  t  €  [e,  oo),  we  get 


1 


and  so  Svx(t )  =  Vi(t)  =  s(t).  Therefore,  s  6  TZ(SV)  C  7^£. 


□ 


Theorem  A. 2. 7  =  Z2(R+). 

Proof.  £  C  1ZL  C  L2( K+)  and  E  =  Z2(R+),  which  implies  1ZL  =  X2(K+)-  □ 


A. 2. 1.4  Approximations  to  Sv.  Here,  it  will  be  shown  that  approximations 
to  v  yield  equally  close  approximations  to  Sv . 


Theorem  A. 2. 8  Fix  e  >  0  and  let  v  6  V  and  v  £  Z2(R+  x  R+)  be  such  that  ||v  —  h||2  <  e. 
Then,  ||5„  —  5^||2  <  e. 

Proof. 


Il-S't- -  5’s||2  =  sup  ||5„a;  -  5sa;| 


11*11  <1 


y*CO 

sup  /  \Svx(t)  —  Syx(t)\2  dt 
Ml<i  J° 

sup  /  /  x(r)v{t,T)  dr  -  /  x(r)v{t,  r)  dr 

|ar||<lJo  \J  0  JO 

sup  /  /  ar(r)  —  5(£,r))  dr 

\x\\<iJo  I  Jo 

r°° 

/  |5(t,_0)x(t)|  dt 

1  </o 


dt 


sup 

M< 


=  sup  IlSp,-^ 
IMI<i 

=  ||5(.-.)||a 

<  ||u-u||2 

<  e  . 


□ 


219 


Therefore,  any  approximation  v  to  v  also  yields  an  equally  close  approximation  S$  to 
Sv.  This  statement  is  important  because  it  allows  us  to  approximate  Sv  by  approximating 
v. 


A. 2. 1.5  Approximations  to  v.  We  now  explore  ways  to  approximate  v  £  V 
since  we  know  that  such  results  can  be  used  to  approximate  Sv. 

I  wish  to  show  that  my  set  V  can  be  approximated  by  a  more  restricted  set  V.  Let 
P  be  a  countable  basis  for  L2(R+)  and  define  V  by 

V  =  {u:i2(R+  x  R+)  |  v{t,  r)  = 

pT  E  span (P)  for  a.e.  r  E  R+} 


Pr(t-T)  t>r 

0  otherwise 


Theorem  A. 2*9  For  V  and  V  as  defined  above ,  V  C  V . 

Proof*  To  prove  this  theorem,  we  need  to  show  that  for  any  nonzero  v  E  V  and 
any  e  >  0,  we  can  find  v  E  V  such  that  ||u  —  v\\2  <  e .  So,  fix  v  E  V  and  e  >  0. 

Since  v  £  Z2(R+  x  R+),  we  have  that  v(*,r)  E  L2(R+)  for  a.e.  choice  of  r.  Also, 
v(t,r)  =  0  for  all  t  <  r.  Therefore,  vr  ==  u(*  —  r,  r)  can  be  approximated  by  some 
pT  E  span (P)  such  that 


Defining  r)  by 


^-Pr||2  < 


y 


v(t,T ) 


pr(/  —  r)  t  >  r 
0  otherwise 


for  a.e.  r,  we  see  that 

noo 

\v(t,T)-v{t,T)\2dtdr 


220 


< 


< 

< 


Since  v  defined  in  this  way  is  an  element  of  V,  this  proves  the  theorem. 


□ 


A. 2. 2  Linear  System  Representation  of  Speech. 


A. 2. 2.1  Speech  as  the  Output  of  a  Linear  System.  As  proven  in  Theo¬ 
rem  A. 2. 7,  the  closure  of  the  range  of  the  linear  operator  SV:L2(R+)  — ►  T2(R+)  is  all  of 
L2 *  where  Sv  is  defined  by 


Svx(t) 


dr  , 


where  v  £  V  and  where 

V  =  {v  £  L2(R+  X  R+)  :  v(t ,  r)  =  0  V  t  <  r}  . 

Making  the  quite  reasonable  assumption  that  any  given  example  of  speech,  s,  is 
an  element  of  X2(R+),  we  have  now  established  that  speech  can  be  produced  by  such  a 
linear  system.  Therefore,  a  linear  system  that  can  produce  that  example  of  speech  for  an 
appropriate  input  (excitation)  can  be  approximated  as  shown  in  Theorems  A. 2. 8  and  A. 2. 9. 

However,  the  linear  systems  that  represent  speech  have  additional  characteristics, 
which  can  be  incorporated  into  a  representation.  Below,  we  will  discuss  a  “r~v arying 
poles”  model  of  the  impulse  response  of  speech.  Following  that,  the  form  that  Zadeh’s 
system  function  takes  for  the  r-varying  poles  model  will  be  examined  briefly. 

A. 2. 2. 2  Rational  Functions.  Rational  functions  are  often  used  to  represent 
the  Laplace  transform  of  time-invariant  impulse  responses.  Indeed,  they  are  the  logical 
representation  for  certain  classes  of  time-invariant  linear  systems. 

A  rational  function  /:  C  — ►  C  can  be  expressed  in  the  form 

eLi  («-*»)“"*>’ 


221 


The  pm’ s  are  called  poles  of  /  and  the  zm’ s  are  called  zeros  of  /.  The  function  /  is  analytic 
everywhere  in  C  except  at  the  poles. 

A. 2. 2. 3  “t -Varying  Pole”  Representation  of  Speech.  When  one  performs  a 
windowed  Fourier  analysis  of  a  sample  of  voiced  speech  for  a  short  duration  window,  one 
sees  results  similar  to  those  one  would  see  from  the  output  of  a  time-invariant  linear  system 
describable  by  a  system  function  consisting  of  a  low-order  rational  function.  Particularly, 
a  time-invariant  linear  system  which  may  be  modeled  by  a  system  function  with  3-4  poles 
and  at  most  1  zero  can  be  found  to  produce  similar  (in  a  frequency  analysis  sense)  results  to 
the  speech.  Performing  a  windowed  Fourier  analysis  on  a  slightly  shifted  window  produces 
results  that  can  be  modeled  by  a  simple  pole-zero  model  with  the  poles  and  zeros  in  slightly 
different  locations.  One  implication  of  this  is  that  the  vocal  tract  behaves  like  a  slowly 
varying  linear  system. 

This  also  is  the  impetus  to  model  the  vocal  tract  as  a  linear  system  with  “r- varying 
poles.”  That  is,  the  impetus  to  model  the  vocal  tract  as  a  system  which,  if  frozen  at  any 
instance  in  time,  has  a  transfer  function  modeled  by  a  rational  function.  So,  an  impulse 
at  time  r  produces  output  identical  to  that  of  some  time-invariant  system  modeled  by  a 
rational  function.  At  different  times  r,  the  equivalent  time  invariant  system  has  different 
poles  and  zeros.  Here,  this  will  be  referred  to  as  a  “r-varying  pole”  model  of  speech 
production. 

Such  a  system  is  represented  by  an  impulse  response  of  the  form 

v(t,r)  =  £-1[JF(-,r)](i-r) 


where 

Ef=.o,(r)s‘ 

where  the  coefficients  of  the  rational  function  vary  with  r.  This  function,  F,  will  be  referred 
to  as  the  “frozen  time  system  function.”  One  implication  of  this  form  is  that  the  system 
response,  v(t,  r),  for  an  impulse  at  some  fixed  time  r  depends  only  on  the  values  of  the  a,-s 
and  the  /3*s  at  time  r,  and  is  totally  independent  of  their  values  at  other  times. 

Note  the  difference  between  the  frozen  time  system  function  with  r-varying  poles 
and  Zadeh’s  system  function  with  time- varying  poles.  In  the  first  case,  the  poles  are  in  the 


F(s,t)  = 


222 


function  transformed  with  respect  to  the  first  variable,  whereas,  with  Zadeh,  the  transform 
is  with  respect  to  the  second  variable. 

If  the  frozen  time  system  function  representation  were  “perfect,”  then  it  would  be  pos¬ 
sible  to  examine  the  configuration  of  the  vocal  tract  at  time  r  and  from  that  configuration, 
determine  the  impulse  response,  v(-,r),  for  an  impulse  at  that  time.  However,  intuitively 
one  can  see  that  this  is  not  so.  Since  the  excitation  takes  a  non-zero  time  to  propagate 
through  the  vocal  tract,  and  the  vocal  tract  is  an  unpredictable  (i.e.,  past  speech  does  not 
determine  future  speech),  time- varying,  physical  system,  the  system  response  v(*,r)  will 
be  affected  by  configurations  (changes)  at  times  greater  than  r. 

However,  that  does  not  mean  that  the  above  model  cannot  be  used.  The  system 
response,  by  its  definition,  must  incorporate  the  effects,  if  any,  of  the  subsequent  system 
changes.  So  the  question  becomes,  can  the  system  responses  typical  of  the  vocal  tract  be 
approximated  well  by  the  above  model? 

I  believe  so,  since  the  amplitude  of  response  of  the  vocal  tract  to  an  impulse  drops 
off  very  quickly  relative  to  the  rate  of  change  in  the  vocal  tract.  For  a  fixed- configuration 
vocal  tract,  the  pole-zero  model  has  been  shown  to  be  good.  Given  the  slow  rate  of  change 
of  the  vocal  tract  and  the  rapid  decrease  in  response,  it  is  possible  that  for  a  given  r,  a 
pole-zero  model  can  be  found  that  will  be  a  good  approximation  of  the  transfer  function 
of  the  time- varying  vocal  tract  to  an  impulse  at  time  r. 


223 


Appendix  B.  Additional  proofs 


B.l  Additional  proofs  from  Chapter  III 

Proof  (Lemma  3.3.1).  To  show  the  first  part  of  the  lemma,  note  that  if  dB(a,  6)  C 
Ii(a,r)  then  B(a,6)  C  K(a,r).  The  set  dB(a,6)  is  given  by 

dB(a,6)  =  {zeC  :  z  =  a  +  6eie,  6  E[  0,2tt)}  . 


Therefore,  what  we  want  is  the  largest  6  such  that  p(a,a  +  Set,s )  <  r.  That  is 


p(a,  a  +  6e,,e) 


6_ 

1  —  |ot|2  —  a6eie 


<  r  , 


which  implies 


5  <  r 1 1  —  \a\2  —  a6etS |  . 

The  right-hand  side  is  minimized  by  choosing  6  =  arga,  giving 

6  <  r(l  —  |a|2  —  |a|^)  , 


which  brings  us  to 


r(l~H2) 

1  +  |a|r 


The  second  part  of  the  lemma  is  shown  in  a  similar  way.  Solving  for  p(a,  a+Se'e)  >  r, 
we  see 


p(a,  a  +  6e’9) 


6_ 

1  —  |  a\2  —  aSeie 


>  r  , 


which  leads  to 


6  >  r 1 1  —  |a|2  —  a6elB \  . 

The  right-hand  quantity  is  maximized  by  choosing  6  =  x  +  arga,  giving 

6  >  r(l  —  |a|2  +  |a|<5)  , 


224 


and  leading  to 


r(l  —  |a|2) 
1  -  lair 


□ 


Proof  (Lemma  3.3.2).  It  is  easily  seen  that  p(0,retd)  =  r  for  all  r  E  [0,1)  and 
6  E  [0,27r),  which  implies  ii(0,r)  =  5(0,  r).  From  Lemma  3.2.4,  we  have  that  for  all 
zED,  that  p(Qyz)  =  p(^a(0),  (j>a{z))  —  p(a,<f)a(z)).  Therefore,  i^(a,r)  is  the  image  of  the 
set  5(0,  r)  under  the  transform  (j)a . 

To  show  that  this  image  is  the  set  described  above  in  the  statement  of  the  lemma, 
we  will  show  that  the  boundaries  of  the  sets  are  the  same.  For  all  6  E  [0, 27r),  we  have 
reld  E  95(0,  r),  which  implies  <f>a{rel6)  E  (a,  r).  The  boundary  of  the  set  5(a,  f)  is  given 
by  the  set  of  points  z  such  that  \a  —  z\  =  f.  Examine  the  quantity  |a  —  <f>a(ret9) |.  If  this 
quantity  is  equal  f,  then  we  will  know  that  9/f(a,r)  =  <95(a,f). 


a(  ) 

© 

1 

a* 

\1  —  r2\a\2  ) 

1  —  areie 

r(l  —  |a|2)(e* 

6  —  ar ) 

(1  —  r2|a|2)(l 

—  are19) 

r(l 

-  |<*l 

!1 

eu 

'(1  —  are' 

“ iB  ^ 

1  - 

r2\a 

I2 

(1  —  arei6 

’) 

r(l 

~  |a| 

!) 

(i 

—  are'6) 

1  - 

r2\a 

|2 

(i 

—  areiB) 

r(l 

-H 

2) 

. 

1  - 

r2|a 

I2 

T  . 

Therefore,  dK(a,r)  =  95(d,f).  Since  Ji(a,r)  and  5(a,f)  are  both  convex,  this  gives  that 
Ii(a,r)  =  B(a,r).  □ 


Proof  (Lemma  3.3.3).  Because  p(ez9 w ,  elQ z)  =  p(tn,z)  for  all  9  E  R,w,2:E  D, 
without  loss  of  generality,  we  may  assume  that  w  E  K+.  Then,  it  is  easily  shown  that 
p(w,z)  =  p(w,z).  That  is,  K(wy6)  is  symmetric  about  the  real  axis.  This  implies  that 
if  there  is  a  unique  z  E  D  for  which  the  maximal  value  \z\  is  obtained,  then  z  E  R. 
Examination  of  the  shape  of  K(w,6),  which  is  a  closed  disk,  reveals  that  there  is  either  a 
unique  z  E  D  such  that  the  maximal  value  of  |z|  is  achieved,  or  else  it  is  achieved  for  all 


225 


z  G  dK(w,S).  In  either  case,  a  point  z  which  obtains  the  maximum  value  \z\  will  be  on 
R  D  dK(w ,  S). 

Since  2,weEflD,  we  have 


p(z,w) 


|  w  —  z  | 
1  —  wz 


Assume  p(w,z)  —  6  and  z  >  w,  to  get 


z  —  w 
1  -  zw 


which  implies  one  candidate  for  z  is  zlf  defined  by 

S  +  w 

Z\  =  z  =  - - —  . 

1  +  8w 


Next  assume  p(w,z)  =  6  and  z  <  w,  to  get 


6  = 


w  —  z 
1  —  zw  ’ 


which  implies  a  second  candidate  for  z  is  z2,  defined  by 

w  —  S 


z2  ~  z  = 


l  —  5w 


This  gives  two  candidate  values  for  z.  In  the  case  where  0  <  z2  <  w,  we  have  that 
| |  <  | zi\.  Examining  the  case  where  z2  <  0,  we  see  that 


\z2\  =  -z2  = 


< 


S  —  w 
1  —  Sw 

(6  —  w)(l  +  Sw) 

(1  —  <5u;)(l  +  Sw) 

S  —  Sw 2  +  w(S2  —  1) 
(1  —  <5u;)(l  +  Sw) 

S  —  Sw 2  +  w(l  —  S2) 
(1  -  <5w)(l  -f-  Sw) 
(S  +  tn)(l  —  Sw) 

(1  -  6u;)(l  +  Sw) 

(S  +  w) 


(1  +  Sw) 


N- 


226 


Therefore,  the  maximal  value  of  \z\  subject  to  the  constraint  that  p(w,z)  <  6  is  given  by 


l*i 


6  +  H 

1  +  <5|u;| 


To  determine  the  value  of  2  corresponding  to  this  \z\  for  a  specific  w,  note  that 
w  =  \w\e'  ars“',  which  implies  el  arg“’  =  Using  this,  we  see  that 

Therefore,  the  z  corresponding  to  the  maximal  value  of  \z\  subject  to  the  constraints  is 
given  by 


w  (  6  +  |w|  \ 

M  \1  +  8\w\) 


□ 

Proof  (Lemma  3.3.4).  If  b  6  K(a,r/ 2)  then  p(a,b )  <  r/2.  Next,  a  6 
B(z,  implies 


\a-z\  < 


r(l  —  \z\2) 


which  in  turn  implies 


p(a,z)  = 


l« 


|1  —  az  | 


< 


< 


< 


< 


r(l~l*|a) 

4|1  —  az  | 
r(l-  |^|)(1+  \z\ 

4(1  -MM) 

r(l  -  |£|KH-_[£| 

4(1-  M) 

Ki  + 1*1) 


Since  p  is  a  metric,  this  gives 


r 

<  2 


p(b,z)  <  p(b,a)  +  p(a,z)  <  r 


227 


which  implies  6  £  K(z,r). 


□ 


Proof  (Lemma  3.3.5).  Assume  |a|  >  6.  Since  p(a,0)  =  |a|,  this  implies  0  g- 
Ii(a,6).  Denote  by  6  one  (of  the  two)  points  on  dK(a,6)  where  dK(a,6 )  is  tangent  to  a 
line  through  0.  Since  line  06  is  tangent  to  dK(a,6 )  (a  circle),  the  line  ba,  is  perpendicular 
to  the  line  06,  where  (from  Lemma  3.3.2)  a  is  the  center  of  the  circle  dIi(a,S).  The  length 
of  line  segment  ba  is  given  by  the  radius  of  dK(a,6 ),  which  is  (also  from  Lemma  3.3.2) 
f  =  •  Using  the  usual  trigonometric  calculations,  this  gives 

A  6  = 


< 

Since  fjjM  — »  0  as  |a|  — >  1  and  sin  x  k.  x  as  x  — >  0,  we  have  that  there  exists  r  £  (0, 1) 
such  that  for  j  >  1, 

Ae  s  rC  ((TV^iw) (1 '  W) 

S  C((T^3))<1-I«l) 

for  all  a  E  D  such  that  r  <  \a\.  □ 

Proof  (Lemma  3.3.6).  For  the  first  inequality,  fix  a  and  consider  the  quantity 
1_^U„I .  Clearly  this  quantity  is  maximized  when  \z\  is  maximized.  By  Lemma  3.3.3,  we 
know  that  subject  to  the  constraint  that  p(a,z)  <  r,  that  \z\  <  .  Therefore,  for  all 

z  £  D  such  that  p(a,  z)  <  r,  we  have 

_ - _  <  _ - _ 


< 

< 


1  +  r\a  | 
1  —  \a\2 

1  +  r 
1  —  \a\2 
2 

1  -  \a\2 


2  arcsin 
2  arcsin 
2  arcsin 


(h 


i  a-H2) 

(1  -  S2)  |a| 

26  (1  —  |a|) 

(1  -  62)  H 


228 


For  the  second  inequality,  note  that  by  Lemma  3.3.2,  we  have 


1  _  (1  —  r2|a|2)2 

m(A'(a,r))  r2(l  -  |a|2)2 

From  an  intermediate  step  in  the  proof  of  Lemma  3.3.6,  we  have  that  given  our  constraints 
on  p(a,  z), 


1  1  +  rial 

1  —  |a||z|  ~  1  -  |a|2  ’ 


which  gives  that 


i-H2  . 

r2(l-|a||*|)2  - 


< 

< 

< 


(l-H2)(l  +  rH)2 
r2(l  —  |a|2)2 

(1  —  r|q|)  (1  +  |a|)(l  —  M)(l  +  r|a|)2 
(1  —  r|a|)  r2(l  —  |a|2)2 

2(1  -r|a|)2(l  +  rla|)2 
r2(l  —  r|a|)(l  -  |a|2)2 
2  (1  —  r2|a|2)2 

(1  —  r )  r2(l  —  |a|2)2 

2 _ 1 

(1  —  r)  m(K(a,r )) 


□ 


Proof  (Lemma  3.3.7).  Given 


m(K(a,  r)) 


r~2(l-H2)2 

(1  —  r2|a|2)2 


and 


we  have 


m(K  (a,  er)) 


62r2(l-  H2)2 
(1  -  e2r2|a|2)2  ’ 


m(K(a,r ))  _  (1  —  e2r2|a|2)2 

m(K  (a,  er))  e2(l  —  r2|a|2)2 

_  (1  —  er|a|)2(l  +  er|a|)2 

e2(l  —  r|a|)2(l  +  r|a|)2 


229 


(1  —  fr|g|)2(l  +  r[a|)2 
-  e2(l  —  r|<z|)2(l  -t-  r|a|)2 
(1  -€r|a|)2 
e2(l  —  r|a|)2 

Since  the  derivative  of  this  quantity  w.r.t.  |a|  is  strictly  positive  in  the  range  0  <  |a|  <  1, 
we  know  that  the  supremum  occurs  on  the  boundary  (|a|  =  1).  This  gives 

ra(A(a,r))  (1  -  gr)2 

m(K(a,er ))  ~  e2(l  —  r)2  ’ 


which  implies 


1  (1  —  er)2  1 

m(I{(a,€r))  ~  e2(l  -  r)2  m(K(a,r))  ’ 


□ 


Proof  (Lemma  3.3.8).  Assume 


This  implies 


p(z,w) 


\z  —  w\  1 
|1  —  "zw\  <  3 


3|  z  —  w\  <  |1  —  zw  | 

=  w))| 

=  \l-\zf  +  z(z-w))\ 

<  (i  -  kl2)  +  \A\z~  w\  > 

which  implies 


(3-\z\)\z-w\  <  (1  —  k|)(l  +  k|)  , 


which  in  turn  gives 


Proof  (Lemma  3.3.9).  For  the  first  inequality,  fix  w  6  D.  We  then  have 


m(K(w,  r )) 


r2(l-H2)2 

(l-r2M2)2  ‘ 


The  quantity  m^— ^  can  be  maximized  by  minimizing  m(K(z,r)).  This  in  turn  is  done 
by  maximizing  \z\  subject  to  the  constraint  that  p(w,z )  <  8.  By  Lemma  3.3.3,  we  have 
that 


M  < 


8  +  |u;| 

1  +  #|u?|  ’ 


which  implies 


Thus,  we  have 


m(K(z,r))  > 


r2(l-  \w\2)2(l  -  82)2 
((1  +  6|w|)2  —  r2(8  +  |rc|)2)2  ' 


m(K(w,r ))  ^  ((1  +  <S|u;|)2  -  r2(8  +  |m|)2)2 
m(K(z,  r))  ~  (1  -  82)2{\  -  r2|w|2)2 

((1  —  r2S2)  -f-  2\w\{8  —  r28)  +  |w|2(<52  —  r2))2 
(1  —  £2)2(1  —  r2|w|2)2 

^  ((1  —  r262)  +  2|io|(l  —  r282)  +  |w|2(l  —  r262))2 

-  (1  —  J2)2(l  —  r2|u;|2)2 

(1  -  r2£2)2(l  +  |H)4 
{1  -  P)2{1  -  r2\w\2)2 
16(1  —  r282)2 
~  (1  -  <52)2(1  -  r2)2  ' 

Therefore, 


1  16(1  -  r2P )2  1 

m(K(z,r ))  ~  (1  -  £2)2(1  —  r2)2  m(K (w,  r)) 


whenever  p(w,  z )  <  6. 

For  the  second  inequality,  fix  26D.  To  minimize  the  quantity  ^ ,  we  maximize 
|l  —Wz |  subject  to  the  constraint  that  p(z,w )  <  8.  With  reasoning  similar  to  that  in 


231 


Lemma  3.3.3,  we  see  that  such  a  w  will  be  a  multiple  of  2,  that  is,  w  —  for  some 
x  ER.  Solving  for  x  subject  to  the  constraint,  we  have 

2  —  rrZ  I 
bl  1 


1  —  |2|m 
1  -  \z\x 

Ifl  ±  i 

1  d=  8\z\ 

Examining  both  values,  we  find  that  the  maximizer  of  1 1  —  wz\  is  x  —  ^ ,  giving 

1 

|1  —  wz 


Therefore 

1  —  \z\2  ~  (l  —  |1  —  wz\ 

□ 


> 


1  —  8\z\ 


> 


1-8 


12  * 


8  =  p(z7w) 


this  gives  for  x, 

x  — 


B.2  Proof  of  Theorem  3.6.1 

First,  the  following  lemma  will  be  needed  in  the  proof  of  Theorem  3.6.1. 

Lemma  B.2.1  Let  E  C  X  be  a  convex  set  and  let  T:  X  Y  be  a  linear  transform.  Then, 
the  image  of  the  set  E  under  the  mapping  T,  denoted  T[E],  is  convex. 

Proof.  Let  yl,y2  G  T[E]  and  let  0  <  a  <  1.  What  is  necessary  to  show  is  that 
ayx  +  (1  -  a)y2  G  T[E]. 


232 


Since  yx,y2  G  T[E ],  there  exist  xi,x2  G  X  such  that  yx  =  Txx  and  y2  =  Tx2.  Since 
E  is  convex,  we  have  axx  +  (1  —  a)x 2  G  E.  Examining  T(axx  +  (1  —  a)x2),  we  see  that 

T(axx  +  (1  -  a)x2)  =  aTxx  +  (1  -  a)Tx2 

=  ayx  +  (1  -  u)V2  G  T[E]  . 

Since  yx,  y2,  and  0  <  a  <  1  were  arbitrarily  chosen,  this  implies  that  T[E]  is  convex.  □ 

Proof  (Theorem  3.6.1).  The  proof  is  an  elaboration  of  that  in  [33],  which  itself 
is  a  adaptation  of  that  in  [5]. 

(1  =>  2)  Assume  1  is  true.  Suppose  there  exists  /  G  Xx  such  that  mf  £  EXtP.  Since 
EXiP  is  the  image  of  a  convex  set,  that  is,  the  unit  ball  of  ix>p  intersected  with  the  set 
of  finitely  non-zero  sequences,  under  a  linear  transform,  by  Lemma  B.2.1,  it  is  convex, 
implying  EXtP  is  convex  also.  Then,  by  the  Hahn-Banach  separation  theorem,  there  exists 
$  G  X*  such  that  for  all  h  G  EXiP,  |$(h)|  <  7  <  m|$(/)|.  Since  it  is  true  for  all  h  G  EXtP , 
it  is  certainly  true  for  xn  G  EPiX  of  the  form  xn  =  J2k= i  ^ n,kUn,k  where  Ya=i  l^n,fc|p  <  1- 
Evaluating  3>(a:„),  we  have 


A:(n) 

k  =  1 


Observe  that  this  is  in  the  form  of  a  functional  An  =  operating  on  the  se¬ 
quence  Choose  An  so  that  it  is  aligned  with  the  sequence  and 

Yll=i\*ntk\p  =  l-  Then 


Ar(n) 

$(z«)  =  E  An, *$(««,*) 

A;  =  l 


/*(»)  \  1/P  A(n)  \  1/? 

[ElAn,fc|Pj 

/fc(n)  \1^9 

ll]l$(«n,fc)N  <  7  <  rn||$||  . 


Since  this  is  true  for  all  n,  this  implies  that 


k(n)  \  1/q 

E  l$K.*)l9  <  7  <  m||$H 

k=l  ) 


233 


which  is  a  contradiction  of  1.  Therefore,  mXi  C  Elp. 


To  prove  the  other  half  of  2,  let  x  6  EltP,  where  x  =  Yln=\  £*= *i  ^ n,kun,k  and  where 
£n=i  (£t=i)|^n,ife|p)  /P  <  1.  Computing  ||x||,  we  have  that 


1*11  =  sup  l$(*)l 

ll*ll<i 


< 


sup 

II*II<1 


N  k(n) 

X  X 

n=l k= 1 


AT 


<  sup  X 

n=i 


k(n ) 


fe=l 


JV  A(n)  \  A///  /*(") 


1/p 


1/7 


,Jb  =  l 


<  sup  X I X  iA».*i  Xi$(u".*)i 

0ex*  „=i  \  jb=1  '  ' 

ll*l<i  v 

AT  /*(«)  X 

X  X  iA».*ip 

n  =  1  \jfc  =  l 


sup 
Il*ll<l  L 


fc(n) 

sup  [  53  l$(“n,fc)|5 
k=l 


1/9 


<  M  sup  ||$||  <  M  . 
II*II<1 


Therefore,  x  G  MXlt  Since  this  is  true  for  each  x  G  ^i)P,  it  is  also  true  for  each  x  G  EX}P, 
and  so  EX)P  C  MXx. 

(2=»3) 


Assume  2.  Let  /0  G  X  and  fix  e  >  0.  Since  mXi  C  E i>p,  there  exists  h0  G  jE1>p,  ho  = 
EnL\]  XtL"?  K,n,kUnik  With  ZnL°i}  (X^  I  Vn,*|P)  ^  <  1,  SUch  that  for  f1  =  f0~  ^0, 

ll/ill  <  e/2- 

Repeat  this  process  to  get  for  each  j,  hj  =  Yln=i  £*=?  ^j,n,kun,k  €  -Ei,p  such  that  for 
fj+ 1  =  fi  ~  ||/j+i||  <  e/23+1. 

Since  for  each  n,  /0  =  /„+ 1  +  X”=o  and  lim,^,*,  ||/n||  =  0,  we  have  that 


/»  =  lim  ±  Mft 

n->oo  '  Til 
J~0 


234 


Defining  xn  by 


Zn 


£™££w.,. 

j=  0  1  =  1  fc  =  l 

max{JV(j):j  =  l,...,n}  k(i)  /  n 

£  £  £ 

*  =  1  *  =  1  \i=0 


Defining  the  transform  T:£itP  — »  X  by 


we  see  that 


where 


oo  k(n) 

T{\n,k}  =  y  y  x„tkunik , 

n=l  *  =  1 


=  r{£ 

i=0 

-  r7n 


m 


^j,i,k} 


In 


Now,  if  we  can  show  that  {7n}£°=1  is  a  Cauchy  sequence  in  £i)P , 
converges  to  a  limit  sequence,  denoted  Ae. 

Examining  ||7n  —  7n-i||i,p,  we  see  that 


m 


we  will  have  that  it 


235 


< 


1  e 
2 "  m 


Therefore, 


||7n  7n'||l,p  ^ 

< 

< 

< 


n  ~  7n-l||l,p  +  1 1 7n  —  1  “  7n-2||lfj»  + 

f  n  1 

-  v  i 


'  +  l|7n'+l  “  7n'||l,; 


m  .  •—?. .  2 

z=n'  + 1 


J__€_  yi  j_ 

2”'  m  f-'  2i 
1  =  1 

J__€_ 

2n' m 


Since  this  value  depends  only  on  n',  and  decreases  with  increasing  n',  {"  nj^i  is  a  Cauchy 
sequence  in  a  complete  space,  and  hence  converges  to  some  A  £  l\tP  with 


11*11 


l,p 


< 


HM±i 

m 


Since  e  was  arbitrarily  chose,  this  implies  that 


inf  ||A||lp 

A6A  (£,/) . p 


< 


ll/ll 


for  all  /  E  X. 

(3=>1) 

Assume  3.  Let  $  £  X*  and  let  /„  =  ^n,kUn,k  where  J2k^=i  |An,fc|p  =  1  for  each 

n  and  where  the  sequence  {A n,*}!b^i  (when  treated  as  a  functional)  is  aligned  with  the 
sequence  {4>(un,k)}*=i  •  Clearly,  /„  E  £i)P.  Then, 


!*(*.)!  = 


fc(n) 

Jb  =  l 

*<»)  \  i/f 

E  l-M' 

fc  =  l  ) 


A(.)  \ i;* 

E  i4K*)i! 

q=i  ) 


*<»>  \  1/1 

El4(»».*)l’l 


236 


However,  for  all  xGl, 


l*(*)l  <  11^1111*11 » 

which  implies 

l*(*.)i  = 

This  gives  the  right-hand  inequality  of  1. 


n)  '  1/9 

EI«K.)N  < 

k  Jfe  =  l 


*»ll  <  ^11*11 


Now,  fix  e  >  0  and  choose  /  E  EXtP  such  that  ||/||  <  1  and  |*(/)|  >  ||*||  -  e.  By  3, 
there  exists  A  6  lx>p  such  that  J2n=i  fetLV  I  A„,fc|p)  /P<^  +  e<^  +  e.  Then, 


11*11 -e  <  |*(/)| 

(N  fc(n) 

EE  ^‘7i,k'U/n,k 

n- 1  jfc  =  l 


< 


N  k(n) 
n—l  k= 1 


< 


i  h 


A(n)  \ 

<  (—  +  e)sup 

m  n  \*=i  / 


1/? 


which  gives 


||<I>||-e 


(£  +  <) 

Since  e  was  arbitrarily  chosen,  this  gives  that 


A(n) 

<  SUP  I  2  l$(“«.*)l9 

n  \*=i 


i/? 


237 


i /q 


m||$||  <  sup 

n 


/*(»> 

£i*o 

\ife=i 


'-n,k) 


□ 


238 


Appendix  C.  Heuristic  algorithm  for  finding  glottal  pulses 

Presented  in  this  section  is  the  algorithm  which  was  used  in  the  computer  application 
described  in  Chapter  VII  to  estimate  the  glottal  pulse  starting  locations  in  samples  of 
speech.  Unfortunately,  an  error  was  found  in  the  implementation  of  algorithm  at  a  date 
too  late  to  allow  for  the  computer  results  to  be  reexamined.  However,  I  feel  that  based  on 
the  nature  of  the  error  (the  weighting  function  varied  slightly  from  that  shown  below)  and 
the  fact  that  the  performance  did  not  change  significantly  on  a  small  test  case,  that  no 
significant  changes  would  be  seen  with  the  corrected  version  of  the  computer  application. 

The  algorithm  presented  here  is  strictly  heuristic.  It  is  based  on  some  observations 
made  during  various  classroom  projects,  modified  by  trial- and- error.  It  has  not  been  tested 
on  a  wide  variety  of  speech  or  speakers. 

In  creating  this  algorithm,  the  primary  concern  was  to  have  a  method  by  which  the 
approximate  locations  of  glottal  pulses  could  be  found  automatically  (i.e.,  by  computer), 
in  speech  corrupted  by  noise  as  well  as  in  clean  speech.  Of  course,  it  was  desired  that 
the  performance  be  reasonably  reliable,  but  since  this  dissertation  did  not  hinge  on  this 
algorithm,  it  was  only  necessary  that  it  perform  acceptably  on  the  speech  samples  with 
which  I  was  working.  The  determination  of  what  constitutes  acceptable  behavior  was  based 
on  my  estimate  of  where  the  glottal  pulses  were  probably  starting.  No  attempt  has  been 
made  to  verify  that  my  estimate  of  these  locations  was  accurate. 

There  are  two  steps  in  this  algorithm.  The  first  step  creates,  from  the  speech  signal, 
a  signal  in  which  the  approximate  glottal  pulse  start  times  can  be  seen  easily  by  inspection, 
appearing  as  local  minima  of  the  signal.  The  second  step  finds  these  local  minima  within 
a  reasonably  sized  local  time  windows. 

The  intent  of  the  first  step  is  to  create,  from  the  speech  signal  s,  a  new  signal,  sl9 
in  which  the  pitch  periods  can  more  easily  be  seen  and  hence,  from  which  the  approxi¬ 
mate  glottal  pulse  start  points  can  be  extracted.  This  step  uses  a  weighting  function,  w , 
described  by 


w(t) 


'  (Jf  +  800(4  -  4„))  c-»°(«-*.> 

e-250 (<-«„) 

—800(4  -  to)e~250^-t^ 

0 


-155  <(<-**)< 
800  —  _ 


125 

- L_  +  -l 

1  I 


125 

—  ^<(t  —  t0)< 


125  '  800 

1  )  < _ L_ 

l°)  -  soo 

0 


otherwise 


239 


Figure  65.  Weighting  Function  used  in  Glottal  Pulse  Finding  Heuristic  Algorithm 

where  t0  =  ~~  is  an  offset  value  used  for  alignment  purposes.  The  weighting  function,  w , 
is  plotted  in  Figure  65. 

The  operation  performed  in  the  first  step  of  the  algorithm  is 

/oo 

\s(T)w(t-  r)|2  dr  . 

•OO 

This  can  also  be  considered  a  convolution,  as  in 

Si  =  \s\2  *  w2  . 

The  second  step  of  the  algorithm  consists  of  finding  relative  minima  in  the  signal 
resulting  from  the  first  step.  These  relative  minima  roughly  correspond  to  the  starting 
points  of  glottal  pulses.  The  minima  are  located  in  windows  of  width  3/500  seconds, 
asymmetrically  centered  about  each  time.  This  can  best  be  described  by  the  boolean- 


240 


valued  function,  s2,  given  by 


S2(t) 


True  sx{t)  =  min  {s^t  +  r)  : 

False  otherwise 


(77) 


Times  t  for  which  s2(t)  are  true  are  the  estimated  start  times  of  glottal  pulses. 

While  the  result  of  the  second  step  is  well  behaved  for  most  true  speech  signals,  it  fails 
in  the  case  of  some  non-speech  signals,  specifically,  in  the  case  of  constant  functions,  where 
s2(t)  =  True  for  every  time  t.  To  avoid  the  practical  aspects  of  problem,  it  is  desirable  to 
modify  this  definition  so  that  s2(t)  =  True  only  if  Si(t)  <  Si(t  —  6t)  for  every  sufficiently 
small  6t  >  0.  This  results  in  no  peaks  being  estimated  to  be  in  constant  portions  of  the 
signal. 

Note  that  given  the  constants  in  the  algorithm  described  above,  glottal  pulses  which 
are  spaced  roughly  2/500  second  apart  or  less  (corresponding  to  a  vocal  pitch  of  around 
250  Hz  or  greater)  will  confuse  the  algorithm.  That  is,  some  pulses  will  be  detected  and 
others  not.  An  example  of  an  occurrence  of  this  is  seen  below  in  Figure  67. 

Shown  in  Figure  66  are  results  from  this  algorithm  on  a  sample  of  the  phoneme  /IY/ 
from  a  female  speaker  and  a  sample  of  the  phoneme  /OY/  from  a  male  speaker.  Both 
samples  were  of  clean  (not  corrupted  by  noise)  speech.  The  pitch  of  the  female  speaker’s 
voice  is  close  to  250  Hz  -  almost  too  high  to  be  appropriate  for  use  with  this  heuristic 
algorithm.  The  estimated  starting  points  are  marked  by  vertical  grid  lines. 

As  seen  in  this  Figure,  the  signal  resulting  from  the  first  step  of  this  algorithm  shows 
clear  minima  where  one  might  guess  the  true  glottal  pulse  starting  locations  to  be.  Note 
that  the  glottal  pulses  too  close  to  the  end  points  of  the  sample  will  be  missed  by  the 
algorithm.  Although  the  pitch  of  the  female  speaker’s  voice  was  high,  the  algorithm  still 
was  able  to  estimate  reliably  the  glottal  pulse  starting  locations. 

Figure  67  shows  the  results  of  the  algorithm  on  the  same  phoneme  samples  corrupted 
with  Gaussian  white  noise  of  10  dB,  6  dB,  3  dB,  and  0  dB  Signal-to-Noise  Ratios  (SNR). 
The  glottal  pulses  estimated  based  on  the  corrupt  speech  are  marked  with  solid  vertical 
grid  lines,  while  those  estimated  based  on  the  clean  speech  are  marked  by  dashed  vertical 
grid  lines  for  comparison. 

The  effects  of  the  increasing  noise  levels  can  be  seen  easily,  especially  in  the  results 
for  the  phoneme  /IY/.  With  increasing  noise  levels,  the  minima  become  less  apparent, 


241 


(a)  Phoneme  /IY/ 


(b)  Phoneme  /OY/ 


1/125  2/125  3/125  4/125  5/125 


1/125  3/125  5/125  7/125 


(c)  Pulses  in  /IY/ 


(d)  Pulses  in  /OY/ 


Figure  66.  Glottal  Pulses  Found  in  Samples  of  Clean  Speech.  Vertical  grid  lines  mark 
the  estimated  locations  of  glottal  pulses  as  found  by  the  algorithm,  (a)  A 
segment  of  the  phoneme  / IY/.  (b)  A  segment  of  the  phoneme  /OY/.  (c)  The 
result  of  the  first  step  of  the  algorithm  on  the  phoneme  /IY/.  (d)  The  result 
of  the  first  step  of  the  algorithm  on  the  phoneme  /OY/. 


making  automatic  detection  problematic.  Also,  the  addition  of  noise  causes  the  minima 
of  the  signal  resulting  from  the  first  step  to  vary  in  location  somewhat.  For  the  phoneme 
/OY/,  from  a  lower  pitched  (male)  speaker,  this  resulted  in  some  small  variations  in  the 
estimated  glottal  pulse  starting  location.  For  the  phoneme  /IY/,  from  a  higher  pitched 
(female)  speaker,  this  variation  was  enough  to  cause  the  second  step  of  the  algorithm  to 
miss  some  glottal  pulses  completely. 

I  believe  that  the  algorithm  could  be  easily  modified  to  work  better  on  higher  pitched 
voices,  however,  this  work  has  not  yet  been  done.  One  possible  approach  would  be  to  reduce 
the  window  size  used  in  the  definition  of  s2?  defined  in  (77). 


(a)  Pulses  in  /IY/  (10  dB  SNR) 


(b)  Pulses  in  /OY/  (10  dB  SNR) 


(d)  Pulses  in  /OY/  (6  dB  SNR) 


(f)  Pulses  in  /OY/  (3  dB  SNR) 


(h)  Pulses  in  /OY/  (0  dB  SNR) 


Figure  67.  Glottal  Pulses  Found  in  Samples  of  Noisy  Speech.  Shown  is  the  result  of  the 
first  step  of  the  algorithm  on  speech  corrupted  with  varying  amounts  of  noise. 
Solid  vertical  grid  lines  mark  the  estimated  locations  of  glottal  pulses  as  found 
by  the  algorithm.  Dashed  vertical  grid  lines  mark  the  pulse  locations  found 
for  the  clean  sample  of  speech.  Results  for  (a)  phoneme  /IY/,  10  dB  SNR 

noise,  (b)  phoneme  / OY/,  10  dB  SNR  noise,  (c)  phoneme  /IY /,  6  dB  SNR 

noise,  (d)  phoneme  /OY/,  6  dB  SNR  noise,  (e)  phoneme  /IY/,  3  dB  SNR 

noise,  (f)  phoneme  /OY/,  3  dB  SNR  noise,  (g)  phoneme  /IY/,  0  dB  SNR 

noise,  and  (h)  phoneme  /OY/,  0  dB  SNR  noise. 


243 


Bibliography 


1.  Adams,  Robert  A.  Sobolev  Spaces.  Pure  and  Applied  Mathematics,  65,  New  York: 
Academic  Press,  1975. 

2.  Ahlfors,  Lars  V.  Complex  Analysis  (2nd  Edition).  New  York:  McGraw-Hill  Book 
Company,  1966. 

3.  Baer,  Thomas  and  Brian  C.  J.  Moore.  “Effects  of  spectral  smearing  on  the  intelligibil¬ 
ity  of  sentences  in  noise,”  Journal  of  the  Acoustical  Society  of  America,  3):1229- 
1241  (September  1993). 

4.  Benedetto,  John  J.  and  William  Heller.  Mathematical  Notes ,  10,  Supplement  1 ,  chap¬ 
ter  Irregular  Sampling  and  the  Theory  of  Frames,  1, 103-125.  Princeton,  NJ:  Princeton 
University  Press,  1990. 

5.  Bonsall,  F.  F.  “A  General  Atomic  Decomposition  Theorem  and  Banach’s  Closed 
Range  Theorem,”  The  Quarterly  Journal  of  Mathematics,  Oxford  Second  Series,  :9- 
14  (1991). 

6.  Cheng,  Yan-Ming  and  Douglas  O’Shaughnessy.  “On  450-600  b/s  Natural  Sounding 
Speech  Coding,”  IEEE  Transactions  on  Speech  and  Audio  Processing,  1{ 2):207-220 
(April  1993). 

7.  Conway,  John  B.  Functions  of  One  Complex  Variable  (2nd  Edition).  Graduate  Texts 
in  Mathematics,  11,  New  York:  Springer- Verlag  New  York,  1978. 

8.  D’Angelo,  Henry.  Linear  Time  Varying  Systems:  Analysis  and  Synthesis.  The  Allyn 
and  Bacon  series  in  Electrical  Engineering,  Boston:  Allyn  and  Bacon,  1970. 

9.  Daubechies,  Ingrid.  Ten  Lectures  on  Wavelets.  CBMS-NSF  Regional  Conference  Se¬ 
ries  in  Applied  Mathematics,  61,  Philadelphia,  PA:  Society  for  Industrial  and  Applied 
Mathematics,  1992. 

10.  Donoho,  David  L.  and  Iain  M.  Johnstone.  “Adapting  to  Unknown  Smoothness  via 
Wavelet  Shrinkage,”  Journal  of  the  American  Statistical  Association ,  P0(432):12OO- 
1224  (1995). 

11.  Duffin,  R.  J.  and  A.  C.  Schaeffer.  “A  Class  of  Nonharmonic  Fourier  Series,”  Trans¬ 
actions  of  the  American  Mathematical  Society ,  72:341-366  (March  1952). 

12.  Duren,  Peter  L.  Theory  of  Hp  Spaces.  Pure  and  Applied  Mathematics,  38,  New  York: 
Academic  Press,  1970. 

13.  Gupta,  Sunil  K.  and  Juergen  Schroeter.  “Pitch-synchronous  frame-by-frame  and 
segment-based  articulatory  analysis  by  synthesis,”  Journal  of  the  Acoustical  Society 
of  America,  94  (5):2517-2530  (November  1993). 

14.  Hayashi,  Shinji  and  Nobuhiko  Kitawaki.  “An  objective  quality  assessment  for  bit- 
reduction  coding  of  wideband  speech,”  Journal  of  the  Acoustical  Society  of  America , 
92(  1):106-113  (July  1992). 

15.  Helmberg,  Gilbert.  Introduction  to  Spectral  Theory  in  Hilbert  Space.  North-Holland 
Series  in  Applied  Mathematics  and  Mechanics,  6,  New  York:  Wiley,  1969. 


244 


16.  Kewley-Port,  Diane  and  Charles  S.  Watson.  “Formant-frequency  discrimination  for 
isolated  English  vowels,”  Journal  of  the  Acoustical  Society  of  America,  05(l):485-496 
(January  1994). 

17.  Kleijn,  W.  Bastiaan.  “Encoding  Speech  Using  Prototype  Waveforms,”  IEEE  Trans¬ 
actions  on  Speech  and  Audio  Processing ,  i(4):386-399  (October  1993). 

18.  Krishnamurthy,  Ashok  K.  “Glottal  Source  Estimation  Using  a  Sum-of-Exponentials 
Model,”  IEEE  Transactions  on  Signal  Processing ,  ^0:682-686  (March  1992). 

19.  Leek,  Marjorie  R.  and  Van  Summers.  “The  effect  of  temporal  waveform  shape  on 
spectral  discrimination  by  normal-hearing  and  hearing-impaired  listeners,”  Journal  of 
the  Acoustical  Society  of  America,  9f( 4):2074-2082  (October  1993). 

20.  Luecking,  Daniel  H.  “Representation  and  Duality  in  Weighted  Spaces  of  Analytic 
Functions,”  Indiana  University  Mathematics  Journal ,  0^(2):319-336  (1985). 

21.  Moore,  Brian  C.  J.  An  Introduction  to  the  Psychology  of  Hearing  (3rd  Edition). 
Academic  Press,  1989. 

22.  Ninness,  Brett  and  Fredrik  Gustafsson.  “A  Unifying  Construction  of  Orthonor¬ 
mal  Bases  for  System  Identification,”  IEEE  Transactions  on  Automatic  Control , 
^(4):515-21  (April  1996). 

23.  Parsons,  Thomas  W.  Voice  and  Speech  Processing.  McGraw-Hill  Series  in  Electrical 
Engineering,  CAD/CAM,  Robotics,  and  Computer  Vision,  New  York:  McGraw-Hill, 
1986. 

24.  Pati,  Y.  C.,  et  al.  “Orthogonal  Matching  Pursuit:  Recursive  Function  Approxima¬ 
tion  with  Applications  to  Wavelet  Decomposition.”  Conference  Record  of  the  Twenty- 
Seventh  Asilomar  Conference  on  Signals ,  Systems  and  Computers  1.  40-44.  1993. 

25.  Quackenbush,  Schuyler  R.,  et  al.  Objective  Measures  of  Speech  Quality .  Prentice  Hall 
Advanced  Reference  Series,  Edglewood  Cliffs,  NJ:  Prentice-Hall,  1988. 

26.  Quatieri,  Thomas  F.  and  Robert  J.  McAulay.  “Shape  Invariant  Time-Scale  and  Pitch 
Modification  of  Speech,”  IEEE  Transactions  on  Signal  Processing ,  ^0(3):497-51O 
(March  1992). 

27.  Savic,  Michael,  et  al.  “Co-channel  Speaker  Separation  Based  on  Maximum- Likelihood 
Deconvolution.”  Proceedings  of  the  ICASSP,  Vol  1.  25-28.  1994. 

28.  Teas,  Donald  C.  Foundations  of  Modern  Auditory  Theory ,  I,  chapter  7.  New  York: 
Academic  Press,  1970.  Jerry  V.  Tobias,  editor. 

29.  ter  Keurs,  Mariken,  et  al.  “Effects  of  spectral  envelope  smearing  on  speech  reception. 

I, ”  Journal  of  the  Acoustical  Society  of  America,  91  (5):2872-2880  (May  1992). 

30.  ter  Keurs,  Mariken,  et  al.  “Effects  of  spectral  envelope  smearing  on  speech  reception. 

II, ”  Journal  of  the  Acoustical  Society  of  America,  00(3):1547-1552  (March  1993). 

31.  Tobias,  Jerry  V.,  editor.  Foundations  of  Modern  Auditory  Theory,  /.  New  York: 
Academic  Press,  1970. 

32.  Tonndorf,  Juergen.  Foundations  of  Modern  Auditory  Theory,  I,  chapter  6.  New  York: 
Academic  Press,  1970.  Jerry  V.  Tobias,  editor. 


245 


33.  Ward,  N.  F.  Dudley  and  Jonathan  R.  Partington.  “Rational  Wavelet  Decompositions 
of  Transfer  Functions  in  Hardy-Sobolev  Classes,”  Mathematics  of  Controls,  Signals, 
and  Systems ,  <?(3):257-78  (1995). 


246 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMBNo.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson 
Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188},  Washington,  DC  20503. 


1.  AGENCY  USE  ONLY  (Leave  blank)  I  2.  REPORT  DATE 


3.  REPORT  TYPE  AND  DATES  COVERED 


5  June  1998 


4.  TITLE  AND  SUBTITLE 

Representations,  Approximations  and  Algorithms  for  Mathematical  Speech  Processing 


6.  AUTHOR(S) 

Laura  R.  C.  Suzuki,  Major,  USAF 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Department  of  Mathematics  and  Statistics 
Air  Force  Institute  of  Technology 
2950  P  Street,  Bldg  640 
Wright-Patterson  AFB,  OH  45433-7765 


Doctoral  Dissertation 


5.  FUNDING  NUMBERS 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

AFIT/DS/ENC/98J-1 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 
Dr.  Jon  Sjogren 
AFOSR/NM 
110  Duncan  Ave  B115 
Bolling  AFB,  DC  20332-8050 


10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


12a.  DISTRIBUTION  AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  words) 

Representing  speech  signals  such  that  specific  characteristics  of  speech  are  included  is  essential  in  many  Air  Force  and  DoD 
signal  processing  applications.  A  mathematical  construct  called  a  frame  is  presented  which  captures  the  important 
time-varying  characteristic  of  speech.  Roughly  speaking,  frames  generalize  the  idea  of  an  orthogonal  basis  in  a  Hilbert  space. 
Specific  spaces  applicable  to  speech  are  LJ2(R)  and  the  Hardy  spaces  H_p(D)  for  p>  1  where  D  is  the  unit  disk  in  the 
complex  plane.  Results  are  given  for  representations  in  the  Hardy  spaces  involving  Carleson’s  inequalities  (and  its 
extensions),  frames  and  hybrid  frames,  as  well  as  L_2(R).  Examples  of  different  speech  signals  are  given  and  the 
representations  via  frames  are  applied  to  demonstrate  its  robustness  and  adaptiveness,  while  using  very  few  coefficients  in  the 
approximation.  Thus,  the  processing,  transmitting  and  storing  of  speech  data  could  be  compressed  or  reduced  and  still  keep 
the  fidelity  of  the  signal. 


14.  SUBJECT  TERMS 


Speech  modeling,  Wavelets,  Frames,  Hardy  spaces,  Carleson's  inequality 


15.  NUMBER  OF  PAGES 

252 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 

UNCLASSIFIED 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 

UNCLASSIFIED 


19.  SECURITY  CLASSIFICATION  20.  LIMITATION  OF  ABSTRAC 
OF  ABSTRACT 


UNCLASSIFIED 


Standard  Form  298  (Rev.  2-89  (EG) 

Prescribed  by  ANSI  Std.  239.18 

Designed  using  Perform  Pro,  WHS/DIOR,  Oct  94 


